*** Welcome to piglix ***

CCSID


CCSID is an abbreviation used by IBM to mean "Coded Character Set Identifier". It is a 16-bit number that represents a specific encoding of a specific code page. For example, Unicode is a code page that has several encoding forms, like UTF-8, UTF-16 and UTF-32.

The terms code page and CCSID are often used interchangeably, even though they are not synonymous. A code page may be only part of what makes up a CCSID. The following definitions, from IBM, help to illustrate this point, from glyph to CCSID and everything in between.

A glyph is the actual physical pattern of pixels or ink that shows up on a display or printout.

A character is a concept that covers all glyphs associated with a certain symbol. For instance, "F", "F", "F", "F", "F", and "F" are all different glyphs, but use the same character. The various modifiers (bold, italic, underline, color, and font) do not change the F's essential F-ness.

A character set contains the characters necessary to allow a particular human to carry on a meaningful interaction with the computer. It does not specify how those characters are represented in a computer. This level is the first one to separate characters into various alphabets (Latin, Arabic, Hebrew, Cyrillic, and so on) or ideographic groups (Chinese, Korean, and so on). It corresponds to a "character repertoire" in the Unicode encoding model.

A code page represents a particular assignment of code point values to characters. It corresponds to a "coded character set" in the Unicode encoding model. A code point for a character is the computer's internal representation of that character in a given code page. Many characters are represented by different code points in different code pages. Certain character sets can be adequately represented with single-byte code pages (which have a maximum 256 code points, hence a maximum of 256 characters), but many require more than that. Examples include JIS X 0208 and Unicode.


...
Wikipedia

...