Chinese Language keyboard input

china, printing, characters, moveable type
Frame holding wooden moveable type for Chinese characters. 29.5 x 21.5 x 3.5 cm. Late Ming dynasty or early Qing dynasty. Held at Ningxia Museum, Yinchuan. Image by BabelStone available under a Creative Commons License

The Chinese language with its thousands of individual characters has advantages and disadvantages. One clear benefit is that written Chinese is much more compact than English so books and letters take less space. But the main disadvantage came in modern times with the increasing use of keyboards. At one point it looked like Chinese would move completely away from using written characters.

Telegraph

The coming of the telegraph in the mid 19th century posed the first challenge. The telegraph communicated with just two signals: dots and dashes made by short or long presses of the key with a pause in between.

morse, code, encoding
Morse Telegraph (1837), historical collection of France Telecom, Telecommunication City in Pleumeur-Bodou, France. Image by Zubro available under a Creative Commons License

To send English text a sequence of dots and dashes was sent representing each letter of the alphabet. Morse code uses an interesting encoding, the letters are given different lengths according to how often they occur. This is a more compact scheme than encoding every letter with a block of five dots or dashes. Instead the most frequent letters are encoded with just one or two tones for example ‘e’ with a single dot and ‘a’ with dot-dash while less frequent ones take more for example ‘q’ dot-dot-dash-dot, ‘x’ dot-dash-dot-dot. This approach makes the length of message considerably shorter and less time to transmit and receive.

But how could the telegraph system be adapted from an alphabet of 26 letters to written Chinese with thousands of different characters? The approach used was to assign a unique number to every character and transmit the digits of the number in standard Morse code. The receiver then had to look up the number in a table and write down the appropriate character.

typesetting, characters, printing
A drawing of a classical typesetter's sorting case. 1740. Image by Christian Friedrich Gessner available under a Creative Commons License

Moveable type

Printing was invented long ago in China and printers faced the same problem. In English you put together individual letters from a case full of reusable letters. These were cast in lead and had good durability. But for Chinese how to quickly find the wanted character among thousands?

Systems were devised to order the characters into small categories with common characters quicker to find. However keeping all the characters in the correct place was a challenge and most printers gave up using it; instead they carved the characters directly onto wooden blocks rather than using reusable single character blocks.

typewriter, input method
De Luxe typewriter made in Japan by Brother in the 1960s. Image by Craig418 available under a Creative Commons License

Typewriters

When typewriters became widespread in the late 19th century the same problem resurfaced. A Chinese typewriter constructed with thousands of keys was just not feasible. It was during this era that Mao Zedong and other Chinese leaders took the view that the phonetic spelling using 'A-Z' would have to replace the written characters. Typewriters were revolutionizing business practice. Not only was it faster to use but there were no problems of illegible handwriting (a constant issue with handwritten Chinese). What was just as important was that the typewriter could produce multiple, identical copies at the same time using carbon paper. Before typewriters came along it was a slow, tedious and error-prone process to copy out a whole document. From this came the pinyin system, which retained the problem of tones. How do you distinguish one character from dozens that differ only be tone? Putting tone marks over vowels was adopted but many books without these special characters left out the all important accent marks and so text was very hard to understand.

Computers

The early computers had limited memory and processing power and it offered no immediate solution to the problem of typing in Chinese characters. However there was no longer a mechanical link between pressing a key and a representation appearing on the screen, software could interpret the keys differently and interact with the user to select a particular character.

For Chinese an array of different ‘Input methods’ shū rù fǎ were devised that all use the standard QWERTY keyboard to create the thousands of different Chinese characters. Modern methods make use of context rather than choosing characters in isolation, just like predictive text for smartphones. They will take account of the character most likely to follow a previous one as well as characters you frequently use (such as a friend's name).

Wubi method (型输 Wǔ bǐ zì xíng shū rù fǎ)

keyboard, input method, character
Wubi input method. Image by Cangjie6 available under a Creative Commons License

One way to enter characters is to build them out of structural elements. The ‘Wubi’ method uses five A-Z keystrokes to uniquely determine each character. The keys are divided into five regions each of five keys. This method produces 125 different combinations which correspond to different constituent elements of a character. The structure is built up by describing the structure in five parts: horizontal, vertical, slope right to left, dots and slope left to right and finally hook strokes. It may not be a coincidence that the method uses five keystrokes which is very similar to 4.7 letters in an average English word. This method gives no phonetic information it is based solely on the look of the character. Although it takes a long time to master this system once learned it is generally faster than any other character entry methods. A recent competition had a typist managing 244 characters per minute which compared very favorably with 100 words per minute for very skilled English typists.

This method is only suitable for people who know the structure of characters by heart and so is very hard for beginners. However it is useful when faced with an unfamiliar character. As you don't know how to pronounce it, the methods that look up a character phonetically are useless but the wubi method will successfully find it for you.

It is also called Wáng mǎ after the inventor Wáng Yǒng mín (b.1943). It was first made available on a PC in 1984 and became the leading entry method for a while. He is credited with saving the Chinese characters because at the time there seemed no way around the problem of entering the characters with a standard keyboard. Pinyin was seen as the way forward to write Chinese without the characters. When General Secretary Hu Yaobang saw the method in action he first could not believe how effective it was and thought some trickery must be involved. He then quickly backed away from promoting pinyin universally and the written characters were rescued. It became so successful that Wang was later involved in much litigation when 'cloned' versions of his method started to appear.

Wubihua method (划输 Wǔ bǐ huà shū rù fǎ)

Confusingly there is another method called ‘wubi’. This in based on the order of strokes used to write a character rather than the completed form. As there are only five different kinds of basic stroke (horizontal, vertical, diagonal right to left, dash left to right and hook) it is easy to just type a number (1-5) to correspond with each stroke in term. A disadvantage is that you need to know the stroke order and identify each type of stroke correctly.

For example the character is drawn with four strokes which correspond to the following stroke types and so to enter it with wubihua you just type in ‘12510’. The last '0' indicates it is complete in just 4 strokes.

Other methods

When Wang Yongmin demonstrated his input method a great many people who had been working on a solution to the problem began to promote their own rival schemes. One method for example took the shapes of English letters as a way to to approximate Chinese strokes. So for example a ‘T’ would be the natural keystroke for a character like dīng. Others split a character into four square sections and each was entered depending on what that section looked like. However only one or two methods have gained widespread support.

Pinyin ( Pīn yīn shū rù fǎ)

Using the QWERTY keyboard to enter the pinyin for a character is perhaps the most obvious and natural solution. Indeed it is now the most popular character input method after the Chinese government heavily backed pinyin in the 1990s. For people who learned the written language using pinyin this is far quicker to learn than the other methods.

input method, windows
Windows 10 Chinese input method from typing 'wo' suggesting appropriate characters to choose from.

The pinyin is typed in without tones and the computer/phone will list characters to select from. Because tone marks are not normally entered there are a great many characters that can match and so a system will make use of context to give you the most likely ones to choose from.

One special case is the handling of the pinyin ǚ: sound used in nǚ: this is entered as ‘v’ rather than ‘u’ (the letter ‘v’ is not otherwise used in pinyin). Another issue is indicating breaks in a sequence of vowels in pinyin, a common example is ‘xian’, should this be treated as ‘xi-an’ or ‘x-ian’? To help with this the app you can put in a quote mark (e.g. “xi’an”) to make it clear which one you want. If you know the tones it is worth entering them if the input method supports it as this speeds up the process considerably.

The Sōu gǒu app is a very popular input method on phones. Apple and Microsoft have developed their own proprietary systems. On Windows 10 you can easily install theChinese language pack from which you can select the Pinyin keyboard entry method.

The double pinyin method Shuāng pīn is also popular as you only need to enter the start and end of the phonetic sound not every letter. Common phonetic endings ‘ai’, ‘ian’ are entered with a single key. This makes it very fast.

The main problem however is that pinyin is not used universally, there are many millions of Wu and Yue (Cantonese) speakers who normally use their local Chinese language rather than pinyin. There is a danger that if the pinyin method becomes universal then many regional dialects and languages will disappear.

Drawing on a phone

Entering pinyin on a mobile phone is not all that easy because you need to accurately type in a sequence using individual small keys. Just tracing the character directly is a popular option which avoids all these complications of mapping characters onto a limited range of keys. It is likely that advances in Optical Character Recognition (OCR) will make drawing a character with a finger or stylus over the screen a very convenient and popular method. As with systems that convert English handwriting to A-Z there has to be a ‘learning’ phase where an individual's style is taken into account. Even so it is a slower method than using the keys.

input method,  keyboard,  wubi
Chinese mobile phone keyboard. Image by Azylber available under a Creative Commons License

Speech recognition

One way to sidestep the whole writing character problem is to let the computer/mobile simply listen to your voice and transcribe into Chinese characters for you. As with interpreting handwriting you'll need to train the device to your voice and accent but it can be pretty accurate and the technology is continuing to improve in leaps and bounds.

Hong Kong

In Hong Kong the ‘Quick’ Sù chéng Cangjie method is very popular. Hong Kong still mainly uses the traditional characters. It is called ‘quick’ because it is quicker than the full Cangjie method. The full Cangjie method was invented back in 1976 by Chu Bong-Foo. It divides the character into graphical units which have a corresponding key. Normally the complex central element is chosen from a provided list once the framework has been selected. The 24 keys on the keyboard are divided in four groups:

Philosophical: (sun, moon and five elements keys A-G)
Strokes: (basic simple stroke types keys H-N)
Anatomy: (parts of the body (heart, mouth, hand etc.) keys O-R)
Shapes: (shape of character keys S-Y)
The keys X and Z are used for special purposes

Many of the elements are radicals or alternative forms of radicals. The full Cangjie method is rather hard to learn so the Quick method has become more popular.

input method, keyboard, bopomofo
The first four characters of the Chinese phonetic alphabet Bopomofo (Zhuyin Fuhao). Image by Immanuel Giel available under a Creative Commons License

Taiwan

Taiwan continues to use the traditional characters and so the mainland Chinese systems that used the simplified set can not be used.

The most popular system is the Zhù yīn shū rù fǎ. A different way of teaching the phonetics especially in Taiwan is the Bopomofo system as an alternative to pinyin - it uses a range of special characters that denote pronunciation more accurately than the pinyin mapping to English ‘A-Z’ sounds. So as Bopomofo (aka Zhù yīn) is well known in Taiwan the entry of characters makes use of this phonetic spelling to choose characters rather than pinyin.

See also