*** Welcome to piglix ***

Precomposed character


A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacritical mark, such as é (Latin small letter e with acute accent). Technically, é (U+00E9) is a character that can be decomposed into an equivalent string of the base letter e (U+0065) and combining acute accent (U+0301). Similarly, ligatures are precompositions of their constituent letters or graphemes.

Precomposed characters are the legacy solution for representing many special letters in various character sets. In Unicode they are included primarily to aid computer systems with incomplete Unicode support, where equivalent decomposed characters may render incorrectly.

In the following example, there is a common Swedish surname Åström written in the two alternative methods, the first one with a precomposed Å (U+00C5) and ö (U+00F6), and the second one using a decomposed base letter A (U+0041) with a combining ring above (U+030A) and an o (U+006F) with a combining diaeresis (U+0308).

Except for the different colors, the two solutions are equivalent and should render identically. In practice, however, some Unicode implementations still have difficulties with decomposed characters. In the worst case, combining diacritics may be disregarded or rendered as unrecognized characters after their base letters, as they are not included in all fonts. To overcome the problems, some applications may simply attempt to replace the decomposed characters with the equivalent precomposed characters.

With an incomplete font, however, precomposed characters may also be problematic – especially if they are more exotic, as in the following example (showing the reconstructed Proto-Indo-European word for "dog"):


...
Wikipedia

...