Soro: A Lightweight Foundation Model and Chatbot for Tajik

AI & ML··2 min read·via ArXivOriginal source →

Soro: A Lightweight Foundation Model and Chatbot for Tajik

arXiv:2605.27379v1 Announce Type: new Abstract: We present Soro, a family of Tajik-specialized conversational large language models (LLMs) designed for real-world deployment under tight compute and connectivity constraints in Tajikistan. Starting from open-weight Gemma 3 checkpoints, we perform Tajik-only continual pretraining on a curated 1.9-billion-token corpus spanning filtered web text, PDF documents, and curriculum-aligned educational materials, followed by supervised instruction tuning o

More Stories