ISO Character Set
HTML Character Sets - Part 3
Forward: In this part of my series, I give you an overview of the ISO character set.
By: Chrysanthus Date Published: 31 Jul 2012
Introduction
Note: If you cannot see the code or if you think anything is missing (broken link, image absent, etc.), just contact me at forchatrans@yahoo.com. That is, contact me for the slightest problem you have about what you are reading.
Description
ISO stands for International Standard Organization. The ASCII character set is too small for international use. So the ISO character set was developed. The ISO character set is so large that it exists in parts. You have the parts, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-15, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-KR.
The default ISO character set part is ISO-8859-1. So, the HTML document will assume, ISO-8859-1, if you do not type any character set in a meta tag. Read the following table, which gives the description of the parts:
Character set | Description | Covers |
---|---|---|
ISO-8859-1 | Latin alphabet part 1 | North America, Western Europe, Latin America, the Caribbean, Canada, Africa |
ISO-8859-2 | Latin alphabet part 2 | Eastern Europe |
ISO-8859-3 | Latin alphabet part 3 | SE Europe, Esperanto, miscellaneous others |
ISO-8859-4 | Latin alphabet part 4 | Scandinavia/Baltics (and others not in ISO-8859-1) |
ISO-8859-5 | Latin/Cyrillic part 5 | The languages that are using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian and Macedonian |
ISO-8859-6 | Latin/Arabic part 6 | The languages that are using the Arabic alphabet |
ISO-8859-7 | Latin/Greek part 7 | The modern Greek language as well as mathematical symbols derived from the Greek |
ISO-8859-8 | Latin/Hebrew part 8 | The languages that are using the Hebrew alphabet |
ISO-8859-9 | Latin 5 part 9 | The Turkish language. Same as ISO-8859-1 except Turkish characters replace Icelandic ones |
ISO-8859-10 | Latin 6 Lappish, Nordic, Eskimo | The Nordic languages |
ISO-8859-15 | Latin 9 (aka Latin 0) | Similar to ISO 8859-1 but replaces some less common symbols with the euro sign and some other missing characters |
ISO-2022-JP | Latin/Japanese part 1 | The Japanese language |
ISO-2022-JP-2 | Latin/Japanese part 2 | The Japanese language |
ISO-2022-KR | Latin/Korean part 1 | The Korean language |
The Unicode Consortium
There is a consortium called the Unicode Consortium. The Unicode Consortium develops the Unicode Standard. Their goal is to replace the existing character-sets (parts) with its standard Unicode Transformation Format (UTF). Unicode can be implemented by several character-sets. The most commonly used encodings are UTF-8 and UTF-16. These are not different parts; they are alternatives.
A Unicode character set is better than the ISO character set in the sense that it encompasses many parts of the ISO character set.
That is it for this part of the series. We stop here and continue in the next part.
Chrys
Related Links
Major in Website DesignWeb Development Course
HTML Course
CSS Course
ECMAScript Course
NEXT