Previous | Table of Contents | Next |
Internationalization28 is the process of enabling a program to run internationally. That is, an internationalized program has the flexibility to run correctly in any country. Once a program has been internationalized, enabling it to run in a particular country and/or language is merely a matter of localizing it for that country and language, or locale.
28This word is sometimes abbreviated I18N, because there are 18 letters between the first I and the last N.
You might think that the main task of localization is the matter of translating a programs user-visible text into a local language. While this is an important task, it is not by any means the only one. Other concerns include displaying dates and times in the customary format for the locale, displaying number and currency values in the customary format for the locale, and sorting strings in the customary order for the locale.
Underlying all these localization issues is the even more fundamental issue of character encodings. Almost every useful program must perform input and output of text, and before we can even think about textual I/O, we must be able to work with the local character encoding standard. This hurdle to internationalization lurks slightly below the surface, and is not very programmer-visible. Nevertheless, it is one of the most important and difficult issues in internationalization.
Java 1.1 provides facilities that address all of these internationalization issues. If you write programs that correctly make use of these facilities, the task of localizing your program for a new country really does boil down to the relatively simple matter of hiring a translator to convert your programs messages. With the expansion of the global economy, and particularly of the global Internet, writing internationalized programs is going to become more and more important, and you should begin to take advantage of Javas internationalization capabilities right away.
There are several distinct pieces to the problem of internationalization, and Javas solution to this problem also comes in distinct pieces. The first issue in internationalization is the matter of knowing what locale a program is running in. A locale is typically defined as a political, geographical, or cultural region that has a distinct language or distinct conventions for things such as date and time formats. The notion of a locale is encapsulated in Java 1.1 by the Locale class, which is part of the java.util package. Every Java program has a default locale, which is inherited from the
operating system (where it may be set by the user). A program can simply rely on this default, or it can change the default. Additionally, all Java methods that rely on the default locale also have variants that allow you to explicitly specify a locale. Typically, though, using the default locale is exactly what you want to do.
Once a program knows what locale it is running in, the most fundamental internationalization issue, as noted above, is the ability to read and write localized text. Since Java uses the Unicode encoding for its characters and strings, any character of any commonly used modern written language is representable in a Java program, which puts Java at a huge advantage over older languages such as C and C++. Thus, working with localized text is merely a matter of converting from the local character encoding to Unicode when reading text, such as a file or input from the user, and converting from Unicode to the local encoding when writing text. Javas solution to this problem is in the java.io package, in the form of a new suite of character-based input and output streams (known as readers and writers) that complement the existing byte-based input and output streams.
The FileReader class, for example, is a character-based input stream used to read characters (which are not the same as bytes in all languages) from a file. The FileReader class assumes that the specified file is encoded using the default character encoding for the default locale, so it converts characters from the local encoding to Unicode characters as it reads them. In most cases, this assumption is a good one, so all you need to do to internationalize the character set handling of your program is to switch from a FileInputStream to a FileReader object, and make similar switches for text output as well. On the other hand, if you need to read a file that is encoded using some character set other than the default character set of the default locale, you can use a FileInputStream to read the bytes of the file, and then use an InputStreamReader to convert the stream of bytes to a stream of characters. When you create an InputStreamReader, you specify the name of the encoding in use, and it performs the appropriate conversion automatically.
As you can see, internationalizing the character set of your programs is a simple matter of switching from byte I/O streams to character I/O streams. Internationalizing other aspects of your program requires a little more effort. The classes in the java.text package are designed to allow you to internationalize your handling of numbers, dates, times, string comparisons, and so on. NumberFormat is used to convert numbers, monetary amounts, and percentages to an appropriate textual format for a locale. Similarly, the DateFormat class, along with the Calendar and TimeZone classes from the java.util package, are used to display dates and times in a locale-specific way. The Collator class is used to compare strings according to the alphabetization rules of a given locale, and the BreakIterator class is used to locate word, line, and sentence boundaries.
Previous | Table of Contents | Next |