Previous Table of Contents Next


14.2.5. Unicode and Character Escapes

Java characters, strings, and identifiers (e.g., variable, method, and class names) are composed of 16-bit Unicode characters. This makes Java programs relatively easy to internationalize for non-English-speaking users. It also makes the language easier to work with for non-English-speaking programmers—a Thai programmer could use the Thai alphabet for class and method names in her Java code.

If two-byte characters seem confusing or intimidating to you, fear not. The Unicode character set is compatible with ASCII and the first 256 characters (0×0000 to 0×00FF) are identical to the ISO8859-1 (Latin-1) characters 0x00 to 0xFF. Furthermore, the Java language design and the Java String API make the character representation entirely transparent to you. If you are using only Latin-1 characters, there is no way that you can even distinguish a Java 16-bit character from the 8-bit characters you are familiar with.

Most platforms cannot display all 38,885 currently defined Unicode characters, so Java programs may be written (and Java output may appear) with special Unicode escape sequences. Anywhere within a Java program (not only within character and string literals), a Unicode character may be represented with the Unicode escape sequence \uxxxx, where xxxx is a sequence of four hexadecimal digits.

Java also supports all of the standard C character escape sequences, such as \n, \t, and \xxx (where xxx is three octal digits). Note, however, that Java does not support line continuation with \ at the end of a line. Long strings must either be specified on a single long line, or they must be created from shorter strings using the string concatenation (+) operator. (Note that the concatenation of two constant strings is done at compile time rather than at runtime, so using the + operator in this way is not inefficient.)

There are two important differences between Unicode escapes and C-style escape characters. First, as we’ve noted, Unicode escapes can appear anywhere within a Java program, while the other escape characters can appear only in character and string constants.

The second, and more subtle, difference is that Unicode \u escape sequences are processed before the other escape characters, and thus the two types of escape sequences can have very different semantics. A Unicode escape is simply an alternative way to represent a character that may not be displayable on certain (non-Unicode) systems. Some of the character escapes, however, represent special characters in a way that prevents the usual interpretation of those characters by the compiler. The following examples make this difference clear. Note that \u0022 and \u005c are the Unicode escapes for the double-quote character and the backslash character.

   // \” represents a “ character, and prevents the normal
   // interpretation of that character by the compiler.
   // This is a string consisting of a double-quote character.
   String quote = “\””;

   // We can’t represent the same string with a single Unicode escape.
   // \u0022 has exactly the same meaning to the compiler as “.
   // The string below turns into “””: an empty string followed
   // by an unterminated string, which yields a compilation error.
   String quote = “\u0022”;

   // Here we represent both characters of an \” escape as
   // Unicode escapes. This turns into “\””, and is the same
   // string as in our first example.
   String quote = “\u005c\u0022”;

14.2.6. Primitive Data Types

Java adds byte and boolean primitive types to the standard set of C types. In addition, it strictly defines the size and signedness of its types. In C, an int may be 16, 32, or 64 bits, and a char may act signed or unsigned depending on the platform. Not so in Java. In C, an uninitialized local variable usually has garbage as its value. In Java, all variables have guaranteed default values, though the compiler may warn you in places where you rely, accidentally or not, on these default values. Table 14.2 lists Java’s primitive data types. The subsections below provide details about these types.

TABLE 14.2. Java primitive data types.

Type Contains Default Size Min Value Max Value

boolean true or false false 1 bit N.A. N.A.
char Unicode character \u0000 16 bits \u0000 \uFFFF
byte Signed integer 0 8 bits -128 127
short Signed integer 0 16 bits -32768 32767
int Signed integer 0 32 bits -2147483648
2147483647
long Signed integer 0 64 bits -9223372036854775808
9223372036854775807
float IEEE 754
floating-point
0.0 32 bits ±3.40282347E+38
±1.40239846E-45
double IEEE 754
floating-point
0.0 64 bits ±1.79769313486231570E+308
±4.94065645841246544E-324

14.2.6.1. The boolean Type

boolean values are not integers, may not be treated as integers, and may never be cast to or from any other type. To perform C-style conversions between a boolean value b and an int i, use the following code:

   b = (i != 0);   // integer-to-boolean: non-0 -> true; 0 -> false;
   i = (b)?1:0;    // boolean-to-integer: true -> 1; false -> 0;


Previous Table of Contents Next