Search for resources by:

Definitions of materials Definitions of levels
Exclude Websites
Advanced Search
Please note: Due to project funding termination in summer 2014, this database is no longer actively being maintained. We cannot guarantee the accuracy of the listings.


Tajik Citations   Tajik Links   Select a New Language

Number of Speakers: 4 million

Key Dialects: Northwestern, Southwestern

Geographical Center: Tajikistan

Tajik is spoken by about 4 million people in the Republic of Tajikistan as well as in adjoining areas of Uzbekistan and Kyrgyzstan (Grimes 1992). Tajik is also the first language of such small Central Asian ethnic groups as the Central Asian Gypsies (Romanies) and the Jews of Samarkhand and Bukhara. Several pidgins are spoken in Tajikistan; Tajik forms their grammatical basis.

Tajik is so closely related to the Persian spoken in Iran and Afghanistan that Tajik is sometimes considered as a dialect of Persian. Because of intense contact with Turkic language speakers, however, and a high rate of bilingualism with Uzbek and Kyrgyz, Tajik has been more influenced by Turkic than Persian has (Comrie 1981, Lazard 1970). The vocabulary of Persian and Tajik have diverged because Tajik borrowed so many terms from Russian, especially political, cultural, and technical terms, and because Persian borrowed more from western European languages (Majidi 1990). Dari Persian (spoken in Afghanistan) is often called Tajik by Russian linguists.

Tajik is a subgroup of West Iranian languages that include the closely related Persian languages of Farsi and Dari; the less closely related languages of Luri, Bakhtiari, and Kumzari; and the non-Persian dialects of Fars Province. Other more distantly related languages of this group include Kurdish, spoken in Turkey, Iraq, and Iran; and Baluchi, spoken in Afghanistan, Iran, and Pakistan. Even more distantly related are languages of the East Iranian group, which includes, for example, Pushto, spoken in Afghanistan; Ossete, spoken in North Ossetian, South Ossetian, and Caucusus USSR; and Yaghnobi, spoken in Tajikistan. Other Iranian languages of note are Old Persian and Avestan (the sacred language of the Zoroastrians for which texts exist from the 6th century BC).

West and East Iranian comprise the Iranian group of the Indo-Iranian branch of the Indo-European family of languages. Indo-Iranian languages are spoken in a wide area stretching from portions of eastern Turkey and eastern Iraq to western India (see maps in Crystal 1987:299, and in Payne 1987:516). The other main division of Indo-Iranian, in addition to Iranian, is the Indo-Aryan languages, a group comprised of many languages of the Indian subcontinent, for example, Sanskrit, Hindi/Urdu, Bengali, Gujerati, Punjabi, and Sindhi.

The main dialect division in Tajik is between the northwestern and southwestern group of dialects (Lazard 1970). The northwestern dialects, which are the basis of Standard Tajik, are spoken in northern and western Tajikistan and southern Uzbekistan. The dialects of the cities of Ferghana, Samarkhand, Bukhara, Hisar, and Karatag and of the Baysun region are all part of the northwestern group, while the dialects spoken in the cities of Matcha and Falghar seem to form their own subgroup. The most northern northwestern dialects have the most turkicized structure.

Tajik was written in Arabic script until the early 1900s. In 1928, the Roman alphabet was introduced, and used until it was replaced by the Cyrillic alphabet in 1940. A modified Cyrillic alphabet was in use until 1994. Recently, the government has attempted to return to a Roman alphabet.

The richly inflected morphological system of Old Iranian has been drastically reduced in Tajik. The language has no grammatical gender or articles, but person and number distinctions are maintained. Nouns are marked for specificity: there is one marker in the singular and two in the plural. Objects of transitive verbs are marked by a suffix. The morphological features of Arabic words are preserved in loans, thus Tajik shows "broken" plural formations, that is, a word may have two different plural forms.

Verbs are formed using one of two basic stems, present and past; aspect is as important as tense: all verbs are marked as perfective and imperfective. The latter is marked by means of prefixation. Both perfective and imperfective verb forms appear in three tenses: present, past and inferential past. The language has an aorist (a type of past tense), and has three moods: indicative, subjunctive, counterfactual. Passive is formed with the verb 'to become', and is not allowed with specified agents. Verbs agree with the subject in person and number. Tajik verbs are normally compounds consisting of a noun and a verb.

Word order in Tajik is Subject-Object-Verb although modifiers follow the nouns they modify and the language has prepositions.

Tajik has six vowels that can be divided into two groups, traditionally known as stable and unstable vowels. Stable vowels are always pronounced with the same length, regardless of the surrounding sounds in the word. Unstable vowels are of normal duration in some contexts, but in others they are shortened or even lost. Stress in Tajik is almost completely predictable; it normally falls on the last syllable of the word.

Prior to World War I, little was known about Tajik. Classical Persian was used for written communication by Tajik speakers until the Soviet period. The classical Persian writers are claimed by both the Persians and the Tajiks as part of their literary and cultural heritage. However, with a different writing system and its geopolitical isolation, Tajik developed its own literary tradition separate from Farsi. Tajik has taken a separate course of development compared to the Persian languages in Iran and can only be understood by a speaker of Teheran Persian with great difficulty.

After the founding of the Soviet Republic of Tajikistan, Tajik became the national language of the republic. Throughout the 1920s and the 1930s, Russian and Tajik scholars standardized the language and the orthography. The standard language is based on the northwestern dialects (Lazard 1970, Comrie 1981), dialects that have been the most heavily influenced by Uzbek.

In 1926, less than 3 percent of the population was literate; of that number, 70 percent was literate in Tajik. By 1970, the literacy rate had risen to nearly 100 percent. Over the same period, the number of speakers in Tajikistan claiming Tajik as their first language dropped from nearly 75 percent to not quite 60 percent. This was mostly due to the immigration of non-Tajik speakers. The percentage of ethnic Tajiks claiming Tajik as their first language has remained steady.

Most schools in the former Tajik Soviet Socialist Republic used Tajik in teaching. During the Soviet period, there was only one university in the country, Lenin Tajik State University, and instruction was provided there in both Tajik and Russian. Radio and television are broadcast in Tajik in both Tajikistan and Uzbekistan. Tajik books, newspapers, and periodicals are published in both the Republics of Tajikistan and Uzbekistan.

The Indo-European speaking Tajiks predate the arrival of Turkic speakers in Central Asia. On the southeast border of Transoxiana (the area around the Oxus river), Tajikistan has been in the path of invaders from the south and east for centuries. In the sixth century, Tajikistan was conquered by the Persian Cyrus I; in the fourth century, Alexander the Great conquered the area. At the turn of that century, the area was overrun by Scythians and then Tocharians, in turn. By the fifth century, possibly Turkic speaking Huns had arrived, and a hundred years later, the first positively identified Turkic invasion occurred. Except for a brief period of Arabic rule, successive Turkic invasions influenced the population so strongly that even the Mongol invaders adopted Turkic. Only the forebears of the Tajiks and the Pamirs retained their Iranian languages. The Tajiks were distinguished as early as the eighth century from the surrounding Turks and Mongols, who were nomadic.

The Tajiks established a semi-independent state under the influence of Uzbekistan in the early nineteenth century, but it was soon annexed by the Russian empire. In 1918, Tajikistan was incorporated into the Turkmen Soviet Socialist Republic. In 1929, a separate Tajikistan Autonomous Soviet Socialist Republic was created. Tajikistan became an independent republic in 1990 upon the break up of the USSR.

The name Tajik has undergone several shifts in meaning, first it meant non-nomadic peoples, then it meant Arabs in Central Asia and by extension their Persian subjects. Later, the term came to mean anyone who accepted Islam. In the sixteenth and seventeenth century, the Russians used the term interchangeably with "Sart" to mean any trader (that is, city dweller) from Central Asia.

Akiner, S. 1986. Islamic Peoples of the Soviet Union. New York: KPI.

Bennigsen, A. and S. E. Wimbush. 1985. Muslims of the Soviet Empire. London: C. Hurst.

Campbell, G. L. 1991. Compendium of the World's Languages, Vol. 1-2. London: Routledge.

Comrie, B. 1981. The Languages of the Soviet Union. New York: Cambridge University Press.

Comrie, B. 1990. The World's Major Languages. Oxford, UK: Oxford University Press.

Europa Publications. 1993. Eastern Europe and the Commonwealth of Independent States 1993. London: Europa Publications.

Grimes, B. F., ed. 1992. Ethnologue: Languages of the World. Dallas, Texas: Summer Institute of Linguistics.

Lazard, G. 1970. "Persian and Tajik." In T. Sebeok, ed. Current Trends in Linguistics, Vol. 6:64-96. Paris: Mouten.

Majidi, M.-R. 1990. Strukturelle Grammatik des Neupersischen (Farsi): Band II, Morphologie. Hamburg, Germany: Helmut Buske Verlag Hamburg.

Linguistic Society of America. 1992. Directory of Programs in Linguistics in the United States and Canada. Washington, DC.

Return to the list of language portals


 This work is licensed under a Creative Commons License.

  • You may use and modify the material for any non-commercial purpose.
  • You must credit the UCLA Language Materials Project as the source.
  • If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one.

Creative Commons License