*** Welcome to piglix ***

Unicode in Microsoft Windows


Microsoft started to consistently implement Unicode in their products quite early.Windows NT was the first operating system that used "wide characters" in system calls. Using the UCS-2 encoding scheme at first, it was upgraded to UTF-16 starting with Windows 2000, allowing a representation of additional planes with surrogate pairs.

Modern Windows versions like Windows XP and Windows Server 2003, and prior to them Windows NT (3.x, 4.0) and Windows 2000 are shipped with system libraries which support string encoding of two types: UTF-16 (often called "Unicode" in Windows documentation) and an 8-bit encoding called the "code page" (or incorrectly referred to as ANSI code page). 16-bit functions have names suffixed with -W (from "wide"), for example, lstrlenW(). Code page oriented functions use the suffix -A, e.g., lstrlenA(), for "ANSI". This split was necessary because many languages, including C, do not provide a clean way to pass both 8-bit and 16-bit strings to the same API or put them in the same structure. Windows also provides the 'M' API which in some locales provided multi-byte encodings, but in most locales is the same as 'A'. Most such 'A' and 'M' functions are implemented as a wrapper that translates the code page to UTF-16 and calls the 'W' function.

The IsTextUnicode function uses a heuristic algorithm on a byte string passed to it to detect whether this string represents UTF-16 text. For very short texts, this function, used by some applications like Notepad, often gives incorrect results. This gave rise to legends about the existence of "Easter eggs" like Bush hid the facts.


...
Wikipedia

...