Click to get to the relevant section.
“The area from Kau-tsí 交趾 (northern Vietnam and southern China) to Kuè-khe 會稽 (Greater Shanghai area) stretching over seven to eight thousand lí (Chinese mile) are inhabited with a hundred types of Ua̍t people with a variety of surnames.”
- Book of Hàn, Geography Chapter 漢書·地理志
This can be seen from the Chinese character for Hokkien 閩 which contains a pictograph of a snake. Both archaeological relics and historical records have indicated that the Kuínn-tang (廣東 Guangdong) region before the Bêng (明 Ming, 1368–1644) dynasty was populated with indigenous people.
The annals of the Khai-guân 開元 period (730s) say that the lands of Bân 閩 county and Ua̍t-tsiu 越州 were anciently the lands of the Eastern Au 東甌. The people of Kiàn-tsiu 建州 and its area are all the descendants of snakes. The five surname clans, such as the Uînn 黃 and the Lîm 林, are their descendants.
This style of engravings can often be found on the drums that have been excavated in the mountain ranges approximately half-way between Hanoi and Kuínn-tsiu (廣州Canton). Over four hundred of them have been excavated to date. A replica of the drums is on display.
It is difficult to determine if the population in southern China consists more of the descendants from the Central Plain or the aboriginal peoples. We also don’t know exactly what language they spoke, but there is evidence they were Kra-Dai, Hmong-Mien and Austroasiatic speakers, based on the vocabulary and grammar these languages left to southern Chinese languages today.
Genetic analysis shows a clear difference between the northern and southern populations despite the fact they both identify themselves as ethnic ‘Hàn’ 漢. This difference is most prominent on the maternal lineages which extending approximately along the Huâi River 淮河 and the Tsîn (秦 Qin) Mountains near to the Yangtze River.
Hakka people did not practise foot binding, a custom of binding the feet of young girls to modify the shape and size of their feet. This is a cultural distinction that set Hakka people apart from other linguistic groups.
In 1907, a Cantonese scholar, Uînn Tsiat 黃節, mentioned in his textbook, Local Geography of Kuínn-tang 廣東鄉土地理教科書, that “there are Hakka and Hoklo (Teochew) people in the Kuínn-tang province but they are not Cantonese and are not of the Hàn race”.
The characters and grammar in major classical period works, such as Analects of Confucius 論語, Records of the Grand Historian 史記, Mencius 孟子 and Commentary of Tsó (Zuo) 左傳, became the prototype upon which later writers modelled their work.
The educated elites were trained to write the classical language through mechanical memorisation.
The exact sounds represented by the characters in the classical texts are not fully understood. But every region in East Asia has its own pronunciation where the vocalisation is dictated by many factors including the original pronunciation, local language, evolution, influences from the neighbouring urban centres, capital cities and other languages it comes in contact with.
To reconcile the differences between classical writings and their local spoken language, people in Japan and Korea developed interpreting or paraphrasing reading techniques respectively known as Kundoku 訓読 or Gu-gyeol 구결. Similar techniques are also practised in China and Vietnam.
This was to ensure that the classical writings would be read in vocabulary and word order that would make sense in their native tongues. The reader would ‘translate’ by substituting the Literary Chinese texts with local vocabulary as well as adjusting the grammar as they vocalise along.
まなびて とき に これ お ならう、 また よろこばしからず や
(manabite toki ni kore o narau, mata yorokobashikarazu ya)
The 13th-century Ku-yeok In-wang-kyung 舊譯仁王經 is a Literary Chinese text annotated by Korean speakers to help the interpretation of the text.
There were similar practices in Europe. Latin texts were annotated in languages of ordinary people (Old English, Old French, Old High German, Old Irish and many others) throughout the High Middle Ages. The main purpose of these annotations was to help the reader correctly interpret the base text using their native language. The 9th-century Vespasian Psalter kept at the British Library is an example.
Not all Chinese characters are pictographs, to which readers may assign any pronunciation to any character (or pronounce however they like). In fact, the majority of Chinese-based characters are compound characters, usually formed by a part that represents the meaning (semantic), and the other part that represents the sound (phonetic). Most readers can make an approximation of how a character is pronounced based on the phonetic part of the characters.
The development of local writings was severely disrupted after the invasion of the Chinese army in 1407 but sustained a revival from the middle of the 17th century up until the Roman alphabets were enforced by the French colonial government in 1910.
The Buddhist scriptures are known as ‘Transformation Texts’ 變文 and ‘Sutra Explanations’ 講經文 discovered in Tun-hông (敦煌 Dunhuang). Texts with a stronger influence of written Mandarin can be found in another genre of literature popular amongst the neo-Confucian scholars known as ‘Records of Words’ 語錄.
Another popular genre widely written in Hokkien and Teochew is a type of poetry chapbook known as Kua-á-tsheh 歌仔冊 in Hokkien or Kua-tsheh 歌册 in Teochew.
in the South
Special Stories of the Hokchew City 閩都別記 is a notable 400-chapter novel written in Hokchew.
The Sing-song Girls of Shanghai 海上花列傳 is a famous novel written in Shanghainese whereas a genre of songbooks known as ‘strum lyrics’ 彈詞 popular in Soo-tsiu (蘇州 Suzhou) has a rich record of the Gôo 吳 language.
A genre of Cantonese songbooks known as ‘Wooden Fish Books’ 木魚書 was widely popular in the Cantonese speaking region whereas Hakka writings can be found in ‘Mountain Songbooks’ 山歌書.
A Cantonese ‘Wooden Fish Books’ 木魚書.
This literature was hugely popular, but due to the lack of prestige, many of them were not well-preserved, and the majority became lost. Apart from novels and poems, regional languages were also commonly used in religious texts 科儀書, contracts and printed materials and many other non-official texts.
The arrival of Western missionaries in the east, particularly in the 19th century, led to an explosion of vernacular (spoken) Chinese languages written in the Roman alphabet.
Samuel Dyer from the London Missionary Society, who lived in Penang for eight years, translated Aesop's Fables into Romanised Hokkien and published his book in Singapore in 1843. The spelling system he employed continued to evolve and later became known as Pe̍h-ōe-jī 白話字.
The Goose that Laid the Golden Eggs story in Dyer's Aesop's Fables.
The Romanised Hokkien (Pe̍h-ōe-jī) became very popular. At its peak, it is estimated that one in three letters posted in the Hokkien-speaking region of China were written in Pe̍h-ōe-jī. Even the postal service had no problem delivering letters to addresses written in the romanised Hokkien.
The Prime Minister of Singapore, Lee Kuan Yew’s political speech was written in Pe̍h-ōe-jī when he was campaigning in Singapore’s 1963 general election.
In 2006, the Taiwanese government improvised and standardised Pe̍h-ōe-jī to cover major accental differences and renamed it as Tâi-lô 台羅.
Due to its prestige, nearly all works written in regional languages invariably exhibit the influences of the classical writings. It is often difficult to categorise a work as vernacular (ordinary language) or literary as the literature forms a long spectrum of styles and registers. Generally speaking, the more formal a work is, the greater it resembles the literary language.
In southern China (non-Mandarin regions), only a small group of government officials could communicate in Mandarin. This was generally the case up until the 20th century.
Mandarin was the spoken language of the imperial court centred in the north. This is how the northern variety of Chinese derived its name. ‘Mandarin’ 官話 literally means ‘the language of the officials’. This English word entered the English language as a loanword from the Portuguese ‘mandarim’ in the 16th century. This word is believed to have two influences: the Sanskrit term ‘mantri’ (मन्त्रि minister) and the Portuguese verb ‘mandar’ (to command).
Local dialects differ from town to town and the varieties spoken in large urban centres often serve as the standard in their respective region.
The Amoy dialect is the lingua franca of the Hokkien speaking region; the Canton dialect is the lingua franca of Kuínn-tang 廣東 and some parts of Kuínn-sai (廣西 Guangxi). Shanghainese is widely accepted as the standard of the northern Gôo (吳 Wu) region.
“The language in Heng-lêng (興寧 Xingning) and Tióng-lo̍k (長樂 Changle) is similar to Siau-tsiu (韶州 Shaozhou). The word for ‘I’ is ‘ngàaih’. The Cantonese people call them ‘Ngàaih jí’. Teochew is spoken in the east and is similar to Hokkien. Teochew has many words that do not have written forms and is incomprehensible to the Cantonese. The language in Tiāu-khèng (肇慶 Zhaoqing), Ko-tsiu (高州 Gaozhou), Liâm-tsiu (廉州 Lianzhou) and Luî-tsiu (雷州 Leizhou) are similar to Cantonese but the people there can’t express themselves well. In Cantonese, the measure word is ‘go’ 個 for humans and ‘jek’ 隻 for animals, but it is the other way round in the aforementioned regions. The Hái-lâm (海南 Hainan) island is out in the sea. Their language is similar to the Teochews and they sometimes sound like the Hokkiens. Some people amongst them speak like the people in Liâm-tsiu 廉州.”
- Comprehensive Records of the Kuínn-tang (1731)
The Comprehensive Records of Kuínn-tang amended in 1561 described that Cantonese is the ‘proper’ language and Teochew sounds ‘foreign’ 侏㒧.
“The Hakkas are still speaking their own language even though they have been living and working as labourers in Tseng-siânn (增城 Zengcheng) for generations. Their language sounds like bad noise. Without even asking them you can tell they are not from this place. ”
- The Records of Tseng-siânn 增城 County (Khiân-liông [乾隆 Qianlong] period)
“Teochew sounds like a foreign language. It is very similar to Hokkien and therefore there are many words which do not have written forms. It is also not mutually intelligible with the languages in the surrounding regions.”
- The Comprehensive Records of Teochew 潮州府志 (Khiân-liông period, 1762)
Tionn-kû 張渠, a Mandarin speaker from Hô-pak (河北 Hebei) living in Huī-tsiu (惠州 Huizhou) in 1730, a town populated with Cantonese, Hakka and Hoklo (Teochew) people, described the languages in the region as “incomprehensible like foreign languages” (侏㒧難解).
The Hokkiens (Sangleys) and the Hakkas were depicted as separate ethnic groups in a 16th-century manuscript known as the Boxer Codex.
The depictions of the Hokkiens and Hakkas in the Boxer Codex.
The Vietnam government also classes the indigenous Hakka and Cantonese-speaking populations living in villages across the Vietnamese border as different people–the Ngái 𠊎 and Sán Dìu 山由. They are not considered as ‘Hàn’.
The term ‘ethnic Hàn’ 漢族 only occurs rarely in historical texts before the end of the 19th century and did not carry the modern meaning of Hàn ethnicity. For example, it was used to refer to a Korean person by another Korean, and it was the last known use of the term.
The most common terminology used by the southern Chinese to refer to themselves is Tn̂g-suann-lâng 唐山人 (or Tn̂g-lâng 唐人) which means ‘people of the Tn̂g mountain’. Again, that is a geographical term and not a bloodline kinship term.
To alienate Manchu, Tsau Iông (鄒容 Zou Rong) who published the most influential revolutionary pamphlet The Revolutionary Army 革命軍 in 1903 divided the ‘yellow race’ into sub-races.
According to his definition, the Chinese are of the Hàn ethnicity 漢族 and the Manchus are of the Tungus ethnicity 通古斯族 and he made it clear that “race must be clearly distinguished for a revolution to happen”.
In almost every chapter of his pamphlet, Tsau Iông mentioned the mythological Yellow Emperor to ensure that the ‘candidate’ for the ancestor of the ethnic Hàn was reinforced in the minds of his audience.
Hokkien has a long history in Southeast Asia. Má-huan 馬歡, a famous voyager, had reported the presence of the Hokkien people in Java as early as the 15th century. As a result of migration, many southern Chinese languages, particularly Hokkien, became the lingua franca in cities and towns in Taiwan and Southeast Asia.
With a high number of speakers, Hokkien emerged as a language for interlinguistic group communication. The prevalence of Hokkien can be observed in urban areas along the west coast of the Malay peninsula, particularly in Penang, Melaka and Singapore.
The most authoritative reference works in China, The Great Encyclopedia of China 中國大百科全書, conspicuously does not recognise the diversity of Chinese languages, treating them as a single entity, while dividing much smaller groups such as Tibeto-Burman, Hmong-Mien into numerous linguistic branches and subbranches.
Sòng Him-kiâu 宋欣橋, a university professor in Hong Kong, who is a Mandarin advocate claimed that both Mandarin and Cantonese speakers belong to the ‘Hàn’ ethnicity, and therefore ‘Mandarin and Cantonese are the same language’. He also relied on the confusion in the meaning of ‘dialect’ to justify his assertion that Cantonese was not fit to be the mother tongue of the people in Hong Kong. His remarks created an uproar in 2018.
Abramson, M. S., 2008. Ethnic identity in Tang China. Philadelphia: University of Pennsylvania Press.
Aesop, Thom, R. 1., Dyer, S. 1., & Stronach, J. 1., 1843. Esop's fables. Singapore: Mission Press.
Anderson, B., 1991. Imagined communities: Reflections on the origin and spread of nationalism (Rev. and extended ed.). London: Verso.
Bauer, R.S., 1996. Identifying the Tai substratum in Cantonese. In The Fourth International Symposium on Language and Linguistics, Thailand, pp. 1806-1844. Institute of Language and Culture for Rural Development, Mahidol University.
Bourdieu, P., & Thompson, J. B., 1991. Language and symbolic power. Cambridge: Polity.
Branner, D. P., 2006. The Chinese rime tables: Linguistic philosophy and historical-comparative phonology. Amsterdam: J. Benjamins.
Bridgman, E. C. 1834-1838. Chinese repository, Vol. 3-6. Canton (China).
Cheung, K. H., and Bauer, R. S., 2002. The representation of Cantonese with Chinese characters, 以漢字寫粵語. Journal of Chinese Linguistics Monograph Number 18. Berkeley, CA: Journal of Chinese Linguistics, University of California.
Chow, K., Doak, K. M., & Fu, P., 2001. Constructing nationhood in modern East Asia. Ann Arbor: University of Michigan Press.
Churchman, C., 2016. The people between the rivers: The rise and fall of a bronze drum culture, 200-750 CE. Lanham, Maryland: Rowman & Littlefield.
Crossley, P. K., 1990. Orphan warriors: Three Manchu generations and the end of the Qing world. Princeton, N.J. (USA): Princeton University Press.
Crossley, P. K., 1999. A translucent mirror: History and identity in Qing imperial ideology. Berkeley: University of California Press.
Culp R., 2008. Teaching Baihua: Textbook Publishing and the Production of Vernacular Language and a New Literary Canon in Early Twentieth-Century China. In The official journal of the Historical Society for Twentieth-Century China, 34:1, 4-41.
DeFrancis, J., 1950. Nationalism and language reform in China. New York: Octagon Books.
Denecke, W., Li, W., & Tian, X., 2017. The Oxford handbook of classical Chinese literature (1000 BCE-900 CE). New York, NY: Oxford University Press.
Dikötter, F., 1992. The discourse of race in modern China. London: Hurst.
Dikötter, F., 1997. The construction of racial identities in China and Japan: Historical and contemporary perspectives. Honolulu: University of Hawai'i Press.
Ding, P. S., 2016. Southern Min (Hokkien) as a migrating language: A comparative study of language shift and maintenance across national borders. Singapore: Springer.
Dreyer, J. T., 1976. China's forty millions: Minority nationalities and national integration in the People's Republic of China. Cambridge, Mass.: Harvard University Press.
Elman, B. A., 2014. Rethinking East Asian languages, vernaculars, and literacies, 1000-1919. Leiden: Brill.
Farmer, E. L., 1995. Zhu Yuanzhang and early Ming legislation: The reordering of Chinese society following the era of Mongol rule. Leiden; New York: E.J. Brill.
Faure, D., 2007. Emperor and ancestor: State and lineage in South China. Stanford, Calif.: Stanford University Press.
Holm, D., 2013. Mapping the old Zhuang character script: A vernacular writing system from Southern China. Boston, Mass.: Brill.
Joint Committee on Contemporary China, & Furth, C., 1976. The limits of change: Essays on conservative alternatives in Republican China. Cambridge [Mass.]; London: Harvard University Press.
Kaske, E., 2008. The Politics of Language in Chinese Education, 1895–1919, Leiden, The Netherlands: Brill.
Klöter, H., 2003. Written Taiwanese. Leiden: Universiteit Leiden.
Klöter, H., 2010. The language of the Sangleys: A Chinese vernacular in missionary sources of the seventeenth century. Leiden; Boston: Brill.
Lee, T. H., 2011. Chinese Schools in Peninsular Malaysia. The Struggle for Survival. Singapore: ISEAS Publishing.
Leibold, J., 2007. Reconfiguring Chinese nationalism: How the Qing frontier and its indigenes became Chinese. New York: Palgrave Macmillan.
Leong, S., Wright, T., & Skinner, G. W., 1997. Migration and ethnicity in Chinese history: Hakkas, Pengmin, and their neighbors. Stanford, Calif: Stanford University Press.
Liu, T. T., & Faure, D., 1996. Unity and diversity: Local cultures and identities in China. Hong Kong: Hong Kong University Press.
Mair Victor. H., 2003. How to Forget Your Mother Tongue and Remember Your National Language. Pinyin.info. Available at
Mair, Victor H., 2013. The Classification of Sinitic Languages: What Is ‘Chinese?’. In Guangshun Cao, Hilary Chappell, Redouane Djamouri, and Thekla Wiebusch eds., Breaking Down the Barriers: Interdisciplinary Studies in Chinese Linguistics and Beyond, pp. 735–754.
Norman, J., 1988. Chinese. Cambridge: Cambridge University Press.
Ramsey, S. R., 1987. The languages of China. Princeton; Guildford: Princeton University Press.
Rhoads, E. J. M., 2000. Manchus & Han: Ethnic relations and political power in late Qing and early republican China, 1861-1928. Seattle: University of Washington Press.
Rossabi, M., 1988. Khubilai Khan: His life and times. Berkeley, Calif. (USA); London: University of California Press.
Safran, W., 1998. Nationalism and ethnoregional identities in China. London; Portland, Or: Frank Cass.
Sim T. W., 2002. Why are the Native Languages of the Chinese Malaysians in Decline?. In Journal of Taiwanese Vernacular, 4(1), pp. 62-95.
Smith, A. D. 1., 1991. National identity. London: Penguin Books.
Snow, D., 2004. Cantonese as written language: The growth of a written Chinese vernacular. Aberdeen, Hong Kong: Hong Kong University Press.
Sybesma, R. (n.d. ). Encyclopedia of Chinese Language and Linguistics Online, Leiden, The Netherlands: Brill.
Tam, G. A., 2020. Dialect and Nationalism in China, 1860–1960. Cambridge: Cambridge University Press.
Xue, F., Wang, Y., Xu, S. et al., 2008. A spatial analysis of genetic structure of human populations in China reveals distinct difference between maternal and paternal lineages. European Journal of Human Genetics 16, pp. 705–717.
Zarrow, P. G., 2015. Educating China: Knowledge, society, and textbooks in a modernizing world, 1902-1937. Cambridge: Cambridge University Press.
黃宣範，1993，《語言、社會與族群意識 : 台灣語言社會學的硏究》，台北市：文鶴出版有限公司。