*** Welcome to piglix ***

Double-byte character


A double-byte character set (DBCS) is a character encoding in which either all characters (including control characters) are encoded in two bytes, or merely every graphic character not representable by an accompanying single-byte character set (SBCS) is encoded in two bytes (Han characters would generally comprise most of these two-byte characters). A DBCS supports national languages that contain a large number of unique characters or symbols (the maximum number of characters that can be represented with one byte is 256 characters, while two bytes can represent up to 65,536 characters). Examples of such languages include Japanese and Chinese. Korean Hangul does not contain as many characters, but KS X 1001 supports both Hangul and Hanja, and uses two bytes per character.

The term DBCS traditionally refers to a character encoding where each graphic character is encoded in two bytes. The DBCS always has lead bytes with the most significant bit set (i.e., being greater than 7 bits), and is always paired up with a single-byte character-set (SBCS). Furthermore, for the practical reason of maintaining compatibility with unmodified, off-the-shelf software, the SBCS is associated with half-width characters and the DBCS with full-width characters.

Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply with ISO 2022. For example, "DBCS" can sometimes mean a double-byte encoding that is specifically not Extended Unix Code (EUC).

This original meaning of DBCS is different from what some consider correct usage today. Some insist that these character encodings be properly called multi-byte character sets (MBCS) or variable-width encodings, because character encodings such as EUC-JP, EUC-KR, EUC-TW, GB18030, and UTF-8 use more than two bytes for some characters, and they support one byte for other characters.


...
Wikipedia

...