Brought to you by EarthWeb
ITKnowledge Logo Login Graphic Click Here!
Click Here!
ITKnowledge
Find:
 
EXPERT SEARCH ----- nav

EarthWeb Direct

EarthWeb sites: other sites

Previous Table of Contents Next


Java was first designed by Big-Endian engineers at Sun Microsystems. It was also designed for the Internet, where almost all protocols specify Big-Endian byte orders. Therefore, it should come as no surprise that Java’s virtual machine uses Big-Endian format for all data types. Little-Endian systems, like the X86, have to translate the Big-Endian data in Java byte code into their native Little-Endian format before executing it.


Secret:  You need to worry about byte order only when you’re reading data that comes from a Little-Endian source. The readByte(), readShort(), readInt(), readLong(), readFloat(), and readDouble() methods of java.io. DataInputStream all assume the data is Big-Endian. Similarly the writeByte(), writeShort(), writeInt(), writeLong(), writeFloat(), and writeDouble() methods of java.io.DataOutputStream write Big-Endian data. To read Little-Endian data in Java, you have to read each byte separately and then reconstruct the int or long from the bytes that make it up. To write Big-Endian data, you have to break the ints or longs apart into bytes and then write the bytes separately. There are several ways to accomplish this, but the most efficient use the bit-level operators discussed later in this chapter. I revisit this topic there.

Unsigned integers

Many traditional programming languages, notably C, allow the use of unsigned quantities. An unsigned number uses its high-order bit for data so it can count twice as high as a number that has to reserve one bit for the sign. However, it can only count positive numbers, not negative numbers. Recall that the largest signed byte is 01111111, which is 127 in decimal. 11111111 is not 255 but rather -128. However, by reading 11111111 as an unsigned quantity, the first 1 bit is interpreted as 128, not the - sign. Thus, as unsigned quantity, 11111111 is indeed 255. On the other hand, there’s no way to express negative numbers as unsigned numbers.

All Java numeric data types except char use signed integers exclusively. However it’s not unlikely that you’ll run across data from programs written in other languages that do have unsigned integers. java.io.DataInputStream has two methods that read unsigned quantities. readUnsignedByte() reads a single byte off the stream and returns an int between 0 and 255. An int is returned instead of a byte or a short because a byte can go only as high as 127, whereas an unsigned byte can go as high as 255. Similarly readUnsignedShort() reads two bytes from the input stream and returns an int between 0 and 65,535.

There is no similar readUnsignedInt() method. If you want to, it’s easy enough to write one yourself. You’ll need to read four bytes and return a long between 0 and 4,294,967,295. Again, the most efficient way to do this uses bit-level operators, so we’ll defer the details until the end of this chapter.

An unsigned long — that is, an 8-byte unsigned integer — is relatively uncommon in practice. No primitive Java data type is large enough to handle unsigned longs. You can, however, use the java.math.BigInteger class instead.

Integer widths

You’ve probably heard a lot of hype about 32-bit computing and 32-bit clean code. You’ll be hearing more about 64-bit platforms in the near future, if you haven’t already. What’s being referred to is, very roughly, the preferred size of an integer on a given computer architecture and the number of bits that can be transferred from main memory to the CPU in one clock cycle. Generally, the higher the number of bits, the faster the computer will run. However, you need to rewrite (or at least recompile) the software to accommodate the proper bit width before you can see the performance gain.

Much legacy code is written in languages like C that do not guarantee the width of an integer. The same C program may use 32-bit ints on a Sparc, 16-bit ints on a Mac, and 64-bit ints on a DEC Alpha. Although these all have Java equivalents, you have to know which one you’re dealing with before you write the code to handle it! Trying to read 16-bit ints with Java’s readInt() method is a sure path to failure.

There’s no guaranteed way to look at a file in the absence of outside information and tell solely from the contents of the file whether it was written using 16-bit integers or 32-bit integers. Similarly, you can’t tell whether or not it uses Big-Endian or Little-Endian data. In an ideal world, you’d have access to a specification that describes the data format used. If you don’t, perhaps you have access to the source code that was used to write the file. If not, you’ll have to do some testing. Try to read the file as 16-bit ints. Do the results make sense? What if you read it as 32-bit ints? Do those results make sense? If you seem to have an excessive number of zeroes appearing in your data, especially if they tend to alternate with non-zero values, that may indicate that you are reading the data using too short an integer. For example, if the data file is full of numbers mostly between 10 and 1000, then if it’s written with 32-bit ints, the high two bytes of each int will be zero.


Previous Table of Contents Next
HomeAbout UsSearchSubscribeAdvertising InfoContact UsFAQs
Use of this site is subject to certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.