Brought to you by EarthWeb
ITKnowledge Logo Login Graphic Click Here!
Click Here!
ITKnowledge
Find:
 
EXPERT SEARCH ----- nav

EarthWeb Direct

EarthWeb sites: other sites

Previous Table of Contents Next


I made an implicit assumption in the preceding section: that the leftmost byte of a multi-byte number is the most significant one. Of course, spatial concepts like left and right really don’t apply to computer memories. In this context, left means lower in memory, and right means higher. Of course, lower and higher are also spatial terms. By lower, I mean “has a smaller address,” and by higher, I mean “has a bigger address.” Thus, if the bytes in a computer memory with n bytes of memory are organized from byte 0 to byte n-1, then byte 0 is the lowest, or leftmost, byte and byte n-1 is the highest, or rightmost, byte.

We associate left with lower addresses in memory because computer programs start executing the instruction at a lower address and then proceed through the instructions to a higher address. In other words, first the instruction in byte 0 is executed, and then the instruction at byte 1, and then the instruction at byte 2, and so on.


Note:  This is a little over-simplified. Not all bytes contain instructions; not all instructions are one byte long (though they are in Java); and some instructions jump backward or forward in memory. However, none of this changes the point I’m making here about associating lower addresses in memory with left and higher addresses with right.

When people who speak English write sequences of numbers they automatically put 0 on the left as shown here:

     0   1   2   3   4   5   6   7   8   9   10   11

Because English is a left-to-right language and most of the people who developed the first computers spoke English, the spatial concept of left came to be implicitly associated with lower addresses in memory. If the first digital computers had been invented in Arabic- or Hebrew-speaking cultures, which use right-to-left scripts, we’d probably speak of byte 0 as the rightmost byte.

Consider the number 6401. This is shorthand for six thousands, four hundreds, zero tens, and one one. The leftmost digit, 6, is the most important. It tells you to within a thousand how big the number is. Subsequent digits improve on the precision, but don’t change the big picture. In jargon, it’s said that 6 is the most significant digit. Similarly, the rightmost digit, 1, is the least significant digit.

The most significant digits are read first. Therefore, this is a Big-Endian number system. The big end of the number (the thousands) comes before the little end (the ones) of the number. This assumption seems to be perfectly reasonable unless and until you encounter a script in which numbers are stored differently.

A number system in which 6401 means 6 ones, 4 tens, 0 hundreds, and 1 thousand is called Little-Endian because the least significant digits come first. There’s no reason why 6401 couldn’t mean 6 ones, 4 tens, 0 hundreds, and 1 thousand. That’s just not the way European scripts count. There’s no mathematical reason for Big-Endian numbers. It’s purely a convention enforced by centuries of common practice. It’s no more right or wrong than the grammatical convention that adjectives tend to come before the nouns they modify. In English and many other languages, adjectives come first. In Latin and many other languages, the nouns come first. Neither is right or wrong. They’re just different.

Bringing this discussion back to the level of computers, recall that a Java int can be thought of as made out of four hexadecimal digits. For example, decimal 6401 is 0x1901. Java follows a Big-Endian scheme. The most significant digit comes first, followed by the second most significant digit, followed by the third most significant digit, followed by the least significant digit.

Macs and most UNIX machines, including Sun’s, also support a Big-Endian architecture, where the digit with the highest place value in a number is in the leftmost (lowest addressed) byte in the number. However, computer architectures based around the Intel X86 and VAX architectures do things exactly the opposite way. Those machines are Little-Endian; the least significant byte in a number comes first. On an X86 system, the decimal number gets laid out in memory as 1091.

Now let’s suppose we have to store the 4-byte integer 1,870,475,384 in this memory. All computer architectures would use four contiguous bytes. First, the integer is converted into its hexadecimal form, 6F7D3078; each 2-digit pair is exactly one byte. Working from the bottom up, as is customary in a stack, the first byte can go to address A, the second byte to address A+1, the third to A+2, and the fourth to A+3. Figure 2-2 shows this arrangement.


Figure 2-2  The number 0x6F7D3078 stored at address A in memory in Big-Endian order.

This is a classic Big-Endian ordering of bytes. However, not all architectures do it like this. In particular, X86 and VAX architectures use a Little-Endian ordering. They put the most significant byte at address A+3, the second most significant byte at address A+2, the third most significant byte at address A+1, and the least significant byte at address A. Figure 2-3 shows this arrangement.


Figure 2-3  The number 0x6F7D3078 stored at address A in Little-Endian order.

As long as you’re on only one computer system, you don’t need to worry about this. All the routines are designed to work with the native data format. However, as soon as you start trying to transfer data between systems, you need to worry about converting between byte orders. Otherwise, the integer you write to a file in Big-Endian format on your Sun as 6F7D3078 (1,870,475,384) will be read in Little-Endian format as 78307D6F (2,016,443,759) on your PC — not the same thing at all.


Note:  Some older computer systems used neither Big-Endian nor Little-Endian byte orders. DEC’s PDP-11 wrote 4-byte integers in this order: second-least-significant byte, least-significant byte, most-significant byte, and second-most-significant byte. Other computers did even stranger things. Fortunately, these architectures have all died out, and we’re now left to deal with only the confusion between Little-Endian and Big-Endian.


Previous Table of Contents Next
HomeAbout UsSearchSubscribeAdvertising InfoContact UsFAQs
Use of this site is subject to certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.