*** Welcome to piglix ***

High Private Use Surrogates


The Unicode Consortium (UC) and the International Organisation for Standardisation (ISO) collaborate on the Universal Character Set (UCS). The UCS is an international standard to map characters used in natural language, mathematics, music, and other domains to machine readable values. By creating this mapping, the UCS enables computer software vendors to interoperate and transmit UCS encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple meanings and thus be improperly decoded if the wrong one is chosen.

UCS has a potential capacity to encode over 1 million characters. Each UCS character is abstractly represented by a code point, which is an integer between 0 and 1,114,111, used to represent each character within the internal logic of text processing software (1,114,112 = 220 + 216or 17 × 216, or hexadecimal 110000 code points). As of Unicode 9.0, released in June 2016, 271,792 (24%) of these code points are allocated, including 128,237 (12%) assigned characters, 137,468 (12%) reserved for private use, 2,048 for surrogates, and 66 designated non-characters, leaving 842,320 (76%) unassigned. The number of encoded characters is made up as follows:

ISO maintains the basic mapping of characters from character name to code point. Often the terms "character" and "code point" will get used interchangeably. However, when a distinction is made, a code point refers to the integer of the character: what one might think of as its address. While a character in UCS 10646 includes the combination of the code point and its name, Unicode adds many other useful properties to the character set, such as block, category, script, and directionality.

In addition to the UCS, Unicode also provides other implementation details such as:

Computer software end users enter these characters into programs through various input methods. Input methods can be through keyboard or a graphical character palette.

The UCS can be divided in various ways, such as by plane, block, character category, or character property.


...
Wikipedia

...