Mohandas Gandhi sits on a chair during a visit to a bhangi colony. He is surrounded by army officers in uniform. Jawaharlal Nehru is visible on the right, and Vallabhai Patel is visible in the middle.

Mohandas Gandhi, with Indian independence leaders Jawaharlal Nehru and Vallabhai Patel, at a sweeper’s colony in Delhi, circa 1946. The practice of associating speech with social position has deep historical roots in Southasia, and caste identity embeds itself even in basic vocabulary. IMAGO/had fotos

How caste pervades Southasian languages

Caste-based dialects – or castelects – and the social hierarchy of languages across Hindi, Tamil, Kannada and Tulu

Abhishek Avtans teaches Indic languages and linguistics at Leiden University (the Netherlands). He tweets at @avtansa and avtansa.bsky.social

Published on:

25 Nov 2025, 11:14 am

This article is part of Dialectical, a Himal series that explores Southasia’s languages, their connections and shared histories.

CASTE AS A socio-cultural phenomenon is both pervasive and enduring across Southasia, shaping institutions, practices and everyday interactions. As caste is such an integral aspect of society in the Subcontinent, it is imperative to understand how it interacts with the multitude of languages spoken in the region. Sociolinguists speak of social dialects or sociolects, referring to non-regional variations in language shaped by factors such as occupation, place of residence, education, income, “new” versus “old” money, racial or ethnic category, cultural background, religion and so on. When these linguistic variations are determined by caste affiliation, they are often referred to as caste dialects, or castelects.

There is a widespread belief among many Southasian communities that a person’s caste can be identified, to some extent, by their speech. That languages are indeed shaped by caste and social stratification becomes clear when looking at linguistic research across much of the Subcontinent – including the Hindi dialects of “touchable” versus “untouchable” villagers, Brahmin and non-Brahmin Tamil, and the social dialects of Tulu and Kannada.

Himal brings you crucial analyses of caste and power in Southasia that you won't find elsewhere. Become a Patron today to support our work.

The practice of associating speech with social position has deep historical roots in Southasia. Classical Sanskrit dramas are known for adhering to conventions in which characters speak different dialects according to their social positions. For example, in Shudraka’s Mrcchakatika (The Little Clay Cart), believed to have been written between the 3rd and 5th centuries CE, this convention is followed quite rigidly. High-status characters speak Sanskrit, while those of lower status invariably use Prakrit – the ancient vernacular dialects of northern and central India, which existed alongside and were derived from Sanskrit.

The play primarily employs two major Prakrits: Sauraseni and Magadhi. Sauraseni is the dominant dialect, spoken consistently by characters such as the stage manager, certain domestic workers and merchants, while the jester speaks Pracya, which is nearly identical to Sauraseni. Women characters of high or respectable status – including the hero’s wife and the courtesan – along with female attendants, speak Avanti, a dialect largely based on Sauraseni but slightly influenced by Maharastri. Magadhi is reserved for characters of lower social status, such as the shampooer, the monk, young boys and various low-ranking attendants. Several sub-dialects also appear: Sakari is used by the antagonist (the royal brother-in-law), and Candali by the executioners, both linguistically close to Magadhi. The constables speak Daksinatya (southern dialect), and the gamblers speak Dhakki, a distinctive mixed sub-dialect incorporating elements of Sanskrit, Sauraseni, Magadhi and Apabhramsa.

Colonial-era historians produced important early documentation of caste-linked linguistic variation. For example, while describing towns, villages, dwellings and rural organisation in the Malabar district of the Madras Presidency (in present-day Kerala), the Scottish bureaucrat William Logan provides a description of house-related words in Malayalam. His Malabar Manual, published circa 1887, reads:

The house itself is called by different names according to the occupant’s caste. The house of a pariah is a ceṛi, while the agrestic slave—the Cheraman—lives in a chaḷa. The blacksmith, the goldsmith, the carpenter, the weaver, and the toddy-drawer (Tiyan) inhabit houses styled pura or kudi; the temple servant resides in a variyam or pisharam or pumaṭham, the ordinary Nayar in a viḍu or bhavanam, while the man in authority of this caste dwells in an iḍam; the Raja (king) lives in a kovilakam or kottaram, the indigenous Brahman (Nambutiri) in an illam, while his fellow of higher rank calls his house a mana or manakkal.

This shows that caste identity embeds itself even in the everyday vocabulary of a language, with entirely different words used across caste lines for basic concepts like “house”.

How Annie Ernaux’s story parallels the struggles of local languages in Bihar

Contemporary literary scholarship has continued to document this phenomenon. The noted Hindi Dalit writer Sheoraj Singh Bechain, in his 2017 poetry anthology Chamar ki Chay (Cobbler’s Tea), observes that in Braj Bhasha – the vernacular language in the west of the Indian state of Uttar Pradesh – even songs are divided along caste lines. For example, jaddu mari and hupanga are songs of the potter community; bajjarag is sung by cowherds; bahri is associated with the Khatiks (butchers); while the Chamar community sings siriya songs during the rainy season. Bechain further writes: “siṛiya is not merely a song, it is an oppressed person, it is an ostracized society.”

Loading content, please wait...

caste

language

human rights