Java Technology Home Page
A-Z Index

Java Developer Connection(SM)
Technical Tips

Downloads, APIs, Documentation
Java Developer Connection
Tutorials, Tech Articles, Training
Online Support
Community Discussion
News & Events from Everywhere
Products from Everywhere
How Java Technology is Used Worldwide
Print Button
 
Tech Tips archive

Tech Tips
August 29, 2000

WELCOME to the Java Developer ConnectionSM (JDC) Tech Tips, August 29, 2000. This issue is about bytecode. Programmers coding in the Java[TM] programming language rarely view the compiled output of their programs. This is unfortunate, because the output, Java bytecode, can provide valuable insight when debugging or troubleshooting performance problems. Moreover, the JDK makes viewing bytecode easy. This tip shows you how to view and interpret Java bytecode. It presents the following topics related to bytecode:

This issue of the JDC Tech Tips is written by Stuart Halloway, a Java specialist at DevelopMentor (http://www.develop.com/java).

These tips were developed using JavaTM 2 SDK, Standard Edition, v 1.3.


GETTING STARTED WITH JAVAP

Most Java programmers know that their programs are not typically compiled into native machine code. Instead, the programs are compiled into an intermediate bytecode format that is executed by the Java[TM] Virtual Machine*. However, relatively few programmers have ever seen bytecode because their tools do not encourage them to look. Most Java debugging tools do not allow step-by-step execution of bytecode; they either show source code lines or nothing.

Fortunately, the JDK[TM] provides javap, a command-line tool that makes it easy to view bytecode. Let's see an example:

 
public class ByteCodeDemo {
    public static void main(String[] args) {
        System.out.println("Hello world");
    }
}

After you compile this class, you could open the .class file in a hex editor and translate the bytecodes by referring to the virtual machine specification. Fortunately, there is an easier way. The JDK includes a command line disassembler called javap, which will convert the byte codes into human-readable mnemonics. You can get a bytecode listing by passing the '-c' flag to javap as follows:

javap -c ByteCodeDemo

You should see output similar to this:

public class ByteCodeDemo extends java.lang.Object {
    public ByteCodeDemo();
    public static void main(java.lang.String[]);
}
Method ByteCodeDemo()
   0 aload_0
   1 invokespecial #1 
   4 return
Method void main(java.lang.String[])
   0 getstatic #2 
   3 ldc #3 
   5 invokevirtual #4 
   8 return

From just this short listing, you can learn a lot about bytecode. Begin with the first instruction in the main method:

   0 getstatic #2 

The initial integer is the offset of the instruction in the method. So the first instruction begins with a '0'. The mnemonic for the instruction follows the offset. In this example, the 'getstatic' instruction pushes a static field onto a data structure called the operand stack. Later instructions can reference the field in this data structure. Following the getstatic instruction is the field to be pushed. In this case the field to be pushed is "#2 <Field java.io.PrintStream out>." If you examined the bytecode directly, you would see that the field information is not embedded directly in the instruction. Instead, like all constants used by a Java class, the field information is stored in a shared pool. Storing field information in a constant pool reduces the size of the bytecode instructions. This is because the instructions only have to store the integer index into the constant pool instead of the entire constant. In this example, the field information is at location #2 in the constant pool. The order of items in the constant pool is compiler dependent, so you might see a number other than '#2.'

After analyzing the first instruction, it's easy to guess the meaning of the other instructions. The 'ldc' (load constant) instruction pushes the constant "Hello, World." onto the operand stack. The 'invokevirtual' invokes the println method, which pops its two arguments from the operand stack. Don't forget that an instance method such as println has two arguments: the obvious string argument, plus the implicit 'this' reference.


HOW BYTECODE PROTECTS YOU FROM MEMORY BUGS

The Java programming language is frequently touted as a "secure" language for internet software. Given that the code looks so much like C++ on the surface, where does this security come from? It turns out that an important aspect of security is the prevention of memory-related bugs. Computer criminals exploit memory bugs to sneak malicious code into otherwise safe programs. Java bytecode is a first line of defense against this sort of attack, as the following example demonstrates:

    public float add(float f, int n) {
        return f + n;
    }

If you add this function to the previous example, recompile it, and run javap, you should see bytecode similar to this:

Method float add(float, int)
   0 fload_1
   1 iload_2
   2 i2f
   3 fadd
   4 freturn

At the beginning of a Java method, the virtual machine places method parameters in a data structure called the local variable table. As its name suggests, the local variable table also contains any local variables that you declare. In this example, the method begins with three local variable table entries, these are for the three arguments to the add method. Slot 0 holds the this reference, while slots 1 and 2 hold the float and int arguments, respectively.

In order to actually manipulate the variables, they must be loaded (pushed) onto the operand stack. The first instruction, fload_1, pushes the float at slot 1 onto the operand stack. The second instruction, iload_2, pushes the int at slot 2 onto the operand stack. The interesting thing about these instructions is in the 'i' and 'f' prefixes, which illustrate that Java bytecode instructions are strongly typed. If the type of an argument does not match the type of the bytecode, the VM will reject the bytecode as unsafe. Better still, the bytecodes are designed so that these type-safety checks need only be performed once, at class load time.

How does this type-safety enhance security? If an attacker could trick the virtual machine into treating an int as a float, or vice versa, it would be easy to corrupt calculations in a predictable way. If these calculations involved bank balances, the security implications would be obvious. More dangerous still would be tricking the VM into treating an int as an Object reference. In most scenarios, this would crash the VM, but an attacker needs to find only one loophole. And don't forget that the attacker doesn't have to search by hand--it would be pretty easy to write a program that generated billions of permutations of bad byte codes, trying to find the lucky one that compromised the VM.

Another case where bytecode safeguards memory is array manipulation. The 'aastore' and 'aaload' bytecodes operate on Java arrays, and they always check array bounds. These bytecodes throw an ArrayIndexOutOfBoundsException if the caller passes the end of the array. Perhaps the most important checks of all apply to the branching instructions, for example, the bytecodes that begin with 'if.' In bytecode, branching instructions can only branch to another instruction within the same method. The only way to transfer control outside a method is to return, throw an exception, or execute one of the 'invoke' instructions. Not only does this close the door on many attacks, it also prevents nasty bugs caused by dangling references or stack corruption. If you have ever had a system debugger open your program to a random location in code, you're familiar with these bugs.

The critical point to remember about all of these checks is that they are made by the virtual machine at the bytecode level, not just by the compiler. A compiler for a language such as C++ might prevent some of the memory errors discussed above, but its protection applies only at the source code level. Operating systems will happily load and execute any machine code, whether the code was generated by a careful C++ compiler or a malicious attacker. In short, C++ is object-oriented only at the source code level, however Java's object-oriented features extend down to the compiled code.


ANALYZING BYTECODE TO IMPROVE YOUR CODE

The memory and security protections of Java bytecode are there for you whether you notice them or not, so why bother looking at the bytecode? In many cases, knowing how the compiler translates your code into bytecode can help you write more efficient code, and can sometimes even prevent insidious bugs. Consider the following example:

    //return the concatenation str1+str2
    String concat(String str1, String str2) {
        return str1 + str2;
    }
 
    //append str2 to str1
    void concat(StringBuffer str1, String str2) {
        str1.append(str2);
    }

Try to guess how many function calls each method requires to execute. Now compile the methods and run javap. You should see output like this:

Method java.lang.String concat1(java.lang.String, java.lang.String)
   0 new #5 
   3 dup
   4 invokespecial #6 
   7 aload_1
   8 invokevirtual #7 
  11 aload_2
  12 invokevirtual #7 
  15 invokevirtual #8 
  18 areturn

Method void concat2(java.lang.StringBuffer, java.lang.String)
   0 aload_1
   1 aload_2
   2 invokevirtual #7 
   5 pop
   6 return

The concat1 method makes five method calls: new, invokespecial, and three invokevirtuals. That is quite a bit more work than the concat2 method, which makes only a single invokevirtual call. Most Java programmers have been warned that because Strings are immutable it is more efficient to use StringBuffers for concatenation. Using javap to analyze this makes the point in dramatic fashion. If you are unsure whether two language constructs are equivalent in performance, you should use javap to analyze the bytecode. Beware of the just-in-time (JIT) compiler, though. Because the JIT compiler recompiles the bytecodes into native machine code, it can apply additional optimizations that your javap analysis does not reveal. Unless you have the source code for your virtual machine, you need to supplement your bytecode analysis with performance benchmarks.

A final example illustrates how examining bytecode can help prevent bugs in your application. Create two classes as follows. Make sure they are in separate files.

public class ChangeALot {
    public static final boolean debug=false;
    public static boolean log=false;
}

public class EternallyConstant {
    public static void main(String [] args) {
        System.out.println("EternallyConstant beginning execution"); 
        if (ChangeALot.debug)
            System.out.println("Debug mode is on");
        if (ChangeALot.log)
            System.out.println("Logging mode is on");
    }
}

If you run the class EternallyConstant you should get the message:

    EternallyConstant beginning execution.

Now try editing the ChangeALot file, modifying the debug and log variables to both be true. Recompile only the ChangeALot file. Run EternallyConstant again, and you will see the following output:

    EternallyConstant beginning execution
    Logging mode is on

What happened to the debugging mode? Even though you set debug to true, the message "Debug mode is on" didn't appear. The answer is in the bytecode. Run javap on the EternallyConstant class, and you will see this:

Method void main(java.lang.String[])
   0 getstatic #2 
   3 ldc #3 
   5 invokevirtual #4 
   8 getstatic #5 
  11 ifeq 22
  14 getstatic #2 
  17 ldc #6 
  19 invokevirtual #4 
  22 return

Surprise! While there is an 'ifeq' check on the log field, the code does not check the debug field at all. Because the debug field was marked final, the compiler knew that the debug field could never change at runtime. Therefore, it optimized the 'if' statement branch by removing it. This is a very useful optimization indeed, because it allows you to embed debugging code in your application and pay no runtime penalty when the switch is set to false. Unfortunately, this optimization can lead to major compile-time confusion. If you change a final field, you have to remember to recompile any other class that might reference the field. That's because the 'reference' might have been optimized away. Java development environments do not always detect this subtle dependency, something that can lead to very odd bugs. So, the old C++ adage is still true for the Java environment. "When in doubt, rebuild all."

Knowing a little bytecode is a valuable assist to any programmer coding in the Java programming language. The javap tool makes it easy to view bytecodes. Occasionally checking your code with javap can be invaluable in improving performance and catching particularly elusive bugs. There is substantially more complexity to bytecode and the VM than this tip can cover. To learn more, read Inside the Java Virtual Machine by Bill Venners.

* As used in this document, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.


— Note —

The names on the JDCSM mailing list are used for internal Sun MicrosystemsTMpurposes only. To remove your name from the list, see Subscribe/Unsubscribe below.

— Feedback —

Comments? Send your feedback on the JDC Tech Tips to: jdc-webmaster

— Subscribe/Unsubscribe —

The JDC Tech Tips are sent to you because you elected to subscribe. To unsubscribe from this and any other JDC Email, select "Subscribe to free JDC newsletters" on the JDC front page. This displays the Subscriptions page, where you can change the current selections.

You need to be a JDC member to subscribe to the Tech Tips. To become a JDC member, go to:

http://java.sun.com/jdc/

To subscribe to the Tech Tips and other JDC Email, select "Subscribe to free JDC newsletters" on the JDC front page.

— Archives —

You'll find the JDC Tech Tips archives at:

http://developer.java.sun.com/developer/TechTips/index.html

— Copyright —

Copyright 2000 Sun Microsystems, Inc. All rights reserved.
901 San Antonio Road, Palo Alto, California 94303 USA.

This Document is protected by copyright. For more information, see:

http://developer.java.sun.com/developer/copyright.html


Print Button
[ This page was updated: 21-Sep-2000 ]
Products & APIs | Developer Connection | Docs & Training | Online Support
Community Discussion | Industry News | Solutions Marketplace | Case Studies
Glossary | Feedback | A-Z Index
For more information on Java technology
and other software from Sun Microsystems, call:
(800) 786-7638
Outside the U.S. and Canada, dial your country's AT&T Direct Access Number first.
Sun Microsystems, Inc.
Copyright © 1995-2000 Sun Microsystems, Inc.
All Rights Reserved. Terms of Use. Privacy Policy.