MIME | Shift_JIS |
---|---|
Standard | JIS X 0208 Appendix 1 |
Language(s) | Japanese |
Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1. 0.9% of all web pages used Shift JIS in April 2017, a decline from 1.3% in July 2014.
Shift JIS is based on character sets defined within JIS standards JIS X 0201:1997 (for the single-byte characters) and JIS X 0208:1997 (for the double-byte characters). The lead bytes for the double-byte characters are "shifted" around the 64 halfwidth katakana characters in the single-byte range 0xA1 to 0xDF. The single-byte characters 0x00 to 0x7F match the ASCII encoding, except for a yen sign (U+00A5) at 0x5C and an overline (U+203E) at 0x7E in place of the ASCII character set's backslash and tilde respectively. The single-byte characters from 0xA1 to 0xDF map to the half-width katakana characters found in JIS X 0201.
HTML written in Shift JIS can still be interpreted to some extent when incorrectly tagged as ASCII, and when the charset tag is in the top of the document itself, since the important start and end of HTML tags and fields, <, >, /, ", &, ; are coded by the same single bytes as in ASCII, and those bytes won't appear in two-byte sequences. Shift JIS is possible to use in string literals in programming languages such as C, but the 0x5C byte will cause problems when it appears as second byte of a two-byte character, because 0x5C, normally backslash, here ¥, will be interpreted as an escape sequence which will mess up the interpretation. If the programmer is aware of this, it would be possible to use printf("ハローワールド¥n");
(where ハローワールド is Hello, world and ¥n is an escape sequence), assuming the I/O system supports Shift JIS output.