Reading Chinese Menus: Concepts: Variant characters, revisited

Last July, I discussed the issue of fonts and handwriting in connection with reading Chinese menus. To summarise, I had two main points:

The more practice you get at reading Chinese characters in different fonts, the better you will be at it.
Certain characters, such as 包 (bāo/package/bun) and 拌 (bàn/mixed), seem to vary between fonts in ways that to a naïve eye might make them appear to be different characters.

The first of these is fairly straighforward, but the second gave rise to some interesting discussion in the comments. Firstly, pulchritude pointed out that some of the examples I gave may only show differences when a Chinese font is compared to a non-Chinese (e.g. Japanese) font, though this doesn't get us off the hook either, since I've certainly seen such characteristics in at least one font used on Chinese menus in London.

Also, shuripentu noted that while 包 doesn't seem to mind whether its central rectangular area is closed or open, there's at least one set of three characters, with different meanings, that differ only in terms of whether the rectangular area is closed, half-closed, or open: 己, 已, and 巳.

Clearly this is a complicated issue! So I was quite pleased to recently run across the Wikipedia article on variant Chinese characters. It appears that while some of these variations are in fact down to aesthetic choices made by the designer of the font, others are considered to be true differences in the basic form of the character.

This might seem an arbitrary distinction, but when it comes to using Chinese characters in a computing context, the difference is explicit. Sets of variants which are considered sufficiently different will either be mapped to different Unicode code points[see footnote], or all mapped to the same code point but distinguished from each other by so-called language tags.

A brief digression here to explain what I mean by a code point. A code point is essentially a numeric label for a character. To simplify vastly, when you type and save a document on your computer, it doesn't store the individual pixels that make up the representation of the letters on your screen, but rather these numeric labels. When you come to view the document again, it reconstructs how it should look, using the characters' code points along with your chosen font(s) and other formatting information. It's easy to redisplay what you've written in a different font, because the underlying characters haven't changed.

I'm not aware of any common menu characters that are mapped to different code points, but there are some which are mapped to the same code point but have different representations under different language tags. Below are some examples of the same character rendered "in mainland Chinese" (zh-cn), "in Hong Kong Chinese" (zh-hk), and "in Japanese" (ja). These may or may not look different, depending on your browser setup, so I've added a screenshot of how they look to me (transcript in the alt tag).

zh-cn	zh-hk	ja	pinyin (meaning)
海	海	海	hǎi (sea/ocean)
骨	骨	骨	gǔ (bone)
花	花	花	huā (flower)
絲	絲	絲	sī (shred)

The Chinese characters 海, 骨, 花, and 絲 arranged in a tabular format. Each character is on its own row, and the columns show its representation with the zh-cn, zh-hk, and ja language tags respectively. ja-海 is different from the other two 海, zh-cn-骨 is different from the other two 骨, zh-cn-花 is different from the other two 花, and all three 絲 differ from each other.

What does this mean for the student of the Chinese menu? Perhaps not a great deal in practical terms. In general, the degree of this type of variation is much smaller than the degree of variation between traditional and simplified characters. Also, unlike the traditional/simplified case, I'm not aware of any patterns that you can use to predict how a character might vary. Finally, regardless of whether or not a character's variations are captured by different code point allocations, the most important thing is to become familiar with them to the point where you can confidently recognise them as the same character.

I still think it's interesting, though! And hopefully you do too.

Footnote: [0] Incidentally, this type of variation is by no means confined to Chinese script. In the Latin alphabet, for example, the lowercase letter "a" has two main basic forms of representation. The one most commonly used in handwriting (at least in the UK) takes the form of a circle with a vertical stroke down the right-hand side. The other has a hooked extension to this vertical stroke, curving back to the left over the circular part. (See diagram on Wikipedia.) In most contexts, the difference between the two is unimportant, a mere matter of the font designer's preference. However, in the International Phonetic Alphabet (IPA), they represent different vowel sounds, and so there are two relevant Unicode code points: "a" (U+0061/LATIN SMALL LETTER A) and "ɑ" (U+0251/LATIN SMALL LETTER ALPHA). Depending on the font you're using in your browser, "a" and "ɑ" may look quite similar to each other, or very different. pne has a couple more examples in comments.

If you have any questions or corrections, please leave a comment (here's how) and let me know (or email me at kake@earth.li). See my introductory post to the Chinese menu project for what these posts are all about.

Flat | Top-Level Comments Only

From:

jyorraku

Here's a handy table of variant characters:

HKIUG TSVCC Table - UNICODE Version - It was developed by librarians in Hong Kong to help their library database vendors develop the software so that it can cross search within the variants.

kake

Interesting, thanks!

pne

I think that Korean also has some variants, so trying out lang="ko" might get you results.

I don't know any details, though (e.g. whether they tend to go with Taiwan/Hong Kong/Japan as regards font choice, or which characters the difference could be noticeable in).

And for Latin, two other things that come to my mind are "g" (which can either have two ovals/circles in print or just one oval/circle plus a j-like hook: in IPA, properly only the second shape is used, so it gets its own code point, "ɡ" (U+0261 LATIN SMALL LETTER SCRIPT G)) and the "two thingies above" diacritic, which in German handwriting can be either two dots or two short lines, while in Hungarian, there's a distinction between öü on the one hand and őű on the other.

lang="ko" does indeed show some differences in a couple of the characters above — at least with my setup, there's a noticeable difference in 花 and a subtle one in 海.

Thanks for the extra Latin examples — I've edited the footnote to point to them.

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Adventures with Kake

Mostly (but not entirely) about reading Chinese menus. At the moment.

Reading Chinese Menus: Concepts: Variant characters, revisited

no subject

no subject

no subject

no subject

Links

Tags

December 2012

Style Credit

Expand Cut Tags