Brought to you by EarthWeb
ITKnowledge Logo Login Graphic The Complete Resource for All Web Builders
The Complete Resource for All Web Builders
ITKnowledge
Find:
 
EXPERT SEARCH ----- nav

EarthWeb Direct

EarthWeb sites: other sites

Previous Table of Contents Next


Positive and negative zero

The smallest value that you can represent in Java is java.lang.Double. MIN_VALUE, 4.94065645841246544e-324. Numbers with absolute values smaller than this are set to zero. However, the sign of the number can be retained if the number is in fact non-zero. The normal 0.0 you type in source code is positive zero. You get negative zero when you multiply a negative number by zero. For example:

     double x = -1.0 * 0.0;

In direct comparisons, negative zero and positive zero appear to be equal. However, some other operations will produce different results depending on whether positive zero or negative zero is used. For example, 1.0 divided by positive zero is positive infinity, but 1.0 divided by negative zero is negative infinity.

The zero literal you type into source code with 0.0 or 0.0F is always positive zero. You can get negative zero only if it shows up in a calculation.

Positive zero is, as you would expect, the float or double value whose bits are all zero. In other words, float positive zero is 0000000000000000-0000000000000000, or 00000000 in hexadecimal. Negative zero is the same, except that the sign bit is one. Thus, float negative zero is 10000000-0000000000000000000000000, or 80000000 in hexadecimal. Double positive zero is 0000000000000000 in hexadecimal, and double negative zero is 8000000000000000.

Denormalized floating-point numbers

Numbers whose unbiased exponent is zero but whose mantissa is not zero are denormalized. Denormalized numbers do not have an implied first bit with value one. All of the bits that a denormalized number has are present in the mantissa. The mantissa is presumed to be multiplied by 2-127 In other words, it acts like it has a biased exponent of -127, or an unbiased exponent of zero. In fact, this is exactly what it does have, so the only real difference between normalized and denormalized floating point numbers is the implied first bit.

Unlike Inf, NaN, and positive and negative zero, all of which can appear in one form or another in Java source code or output, denormalized numbers don’t look any different from regular floating point numbers. However, being able to recognize and decode them will become important when you learn how to disassemble Java byte code in Chapters 4 and 5.

CHAR

The char data type in Java is considered to be a number, but it’s a funny one. Most obviously, when you try to print a char, you don’t get a number. Rather you get a character like “a” or “#”. Secondly, char literals don’t look like numbers in source code. You normally enter a char like this:

     char c = `r';

You can, however, use integer literals to assign values to char variables. The following statement does exactly the same thing as the previous one:

     char c = 114;

You don’t often see Java source code that initializes chars with integer literals, because most programmers don’t walk around with the entire ASCII chart in their head. The meaning of the first statement is much more obvious than the meaning of the second, but they produce identical byte code.

Chars are two bytes wide-they take up the same space as a short. However, chars are not shorts. Shorts are signed and chars are unsigned. The first bit in a char is the 32,768 place, not a sign bit. Thus, while 1000000000000001 interpreted as a short is -32,768, 1000000000000001 interpreted as a char is 32,769. Chars range from 0 to 65,535.

The Java compiler has to work a little magic to handle this. The line

     char c = 114;

compiles without problem. So does the line

     char d = 45000;

Both 114 and 45000 are within the range of a char. However, the following two lines produce compile-time error messages, telling you an explicit cast is needed to convert an int to a char:

     char e = -123;
     char f = 65536;

Java characters are understood to be part of the Unicode character set. The Unicode character set has, at the time of this writing, 38,885 characters, each two bytes wide. Unicode scripts include alphabets used in Europe, Africa, the Middle East, India, and many other parts of Asia, as well as the unified Han set of East Asian ideographs and the complete ideographs for Korean Hangul. Some scripts are not yet supported or are only partially supported, primarily because these scripts are not yet well understood.

Unsupported scripts include Braille, Cherokee, Cree, Ethiopic, Khmer (a.k.a. Cambodian), Maldivian (a.k.a. Dihevi), Mongolian, Moso (a.k.a. Naxi), Pahawh Hmong, Rong (a.k.a. Lepcha), Sinhalese, Tagalog, Tai Lu, Tai Mau, Tifinagh, Yi (a.k.a. Lolo), and Yoruba. Cherokee, Ethiopic, Braille, and possibly Khmer are likely to be added in the near future. Some of these languages can be written with other scripts that Unicode does support. For example, Mongolian is commonly written using the Cyrillic alphabet, and Hmong can be written in ASCII.

Furthermore, Unicode does not support many archaic alphabets, including Ahom, Akkadian Cuneiform, Aramaic, Babylonian Cuneiform, Balinese, Balti, Batak, Brahmi, Buginese, Chola, Cypro-Minoan, Egyptian hieroglyphics, Etruscan, Glagolitic, Hittite, Javanese (a particularly galling omission), Kaithi, Kawi, Khamti, Kharoshthi, Kirat (Limbu), Lahnda, Linear B, Mandaic, Mangyan, Manipuri (Meithei), Meroitic (Kush), Modi, Numidian, Ogham, Pahlavi (Avestan), Phags-pa, Pyu, Old Persian Cuneiform, Phoenician, Northern Runic, Satavahana, Siddham, South Arabian, Sumerian Cuneiform, Syriac, Tagbanuwa, Tircul, and Ugaritic Cuneiform. Runic and Ogham are likely to be added in the near future. Some of the rest of these languages, such as Linear B, are still areas of active research among linguists. Of the remainder, few (if any) are likely to be added to Unicode in the foreseeable future, even those that are fairly well understood.

Theoretically, Unicode can be expanded to cover up to 65,536 different characters. This is not quite enough to handle every character from all the world’s alphabets, primarily because of the large number of characters in the pictographic alphabets used for Chinese, Japanese, and historical Vietnamese. The Chinese alphabet alone has more than 80,000 different characters. However, by combining similar characters in these four alphabets so that some chars represent different words in different languages, all of the alphabets and the most commonly used pictographs can be squeezed into two bytes.


Previous Table of Contents Next
HomeAbout UsSearchSubscribeAdvertising InfoContact UsFAQs
Use of this site is subject to certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.