WELCOME to the Java Developer ConnectionSM
(JDC) Tech Tips, August 29, 2000. This issue is about bytecode. Programmers
coding in the Java[TM] programming language rarely view the
compiled output of their programs. This is unfortunate, because
the output, Java bytecode, can provide valuable insight when
debugging or troubleshooting performance problems. Moreover,
the JDK makes viewing bytecode easy. This tip shows you how
to view and interpret Java bytecode. It presents the following
topics related to bytecode:
This issue of the JDC Tech Tips is written by Stuart Halloway,
a Java specialist at DevelopMentor
(http://www.develop.com/java).
These tips were developed using JavaTM 2 SDK,
Standard Edition, v 1.3.
GETTING STARTED WITH JAVAP
Most Java programmers know that their programs are not typically
compiled into native machine code. Instead, the programs are
compiled into an intermediate bytecode format that is executed by
the Java[TM] Virtual Machine*. However, relatively few
programmers have ever seen bytecode because their tools do not
encourage them to look. Most Java debugging tools do not allow
step-by-step execution of bytecode; they either show source code
lines or nothing.
Fortunately, the JDK[TM] provides javap
, a command-line tool
that makes it easy to view bytecode. Let's see an example:
public class ByteCodeDemo {
public static void main(String[] args) {
System.out.println("Hello world");
}
}
After you compile this class, you could open the .class
file in a hex editor and translate the bytecodes by referring
to the virtual machine specification. Fortunately, there is an easier way.
The JDK includes a command line disassembler called javap
,
which will convert the byte codes into human-readable mnemonics. You can get
a bytecode listing by passing the '-c'
flag to javap
as follows:
javap -c ByteCodeDemo
You should see output similar to this:
public class ByteCodeDemo extends java.lang.Object {
public ByteCodeDemo();
public static void main(java.lang.String[]);
}
Method ByteCodeDemo()
0 aload_0
1 invokespecial #1
4 return
Method void main(java.lang.String[])
0 getstatic #2
3 ldc #3
5 invokevirtual #4
8 return
From just this short listing, you can learn a lot about bytecode.
Begin with the first instruction in the main method:
0 getstatic #2
The initial integer is the offset of the instruction in the method.
So the first instruction begins with a '0'
. The mnemonic for the
instruction follows the offset. In this example, the 'getstatic'
instruction pushes a static field onto a data structure called the
operand stack. Later instructions can reference the field in this
data structure. Following the getstatic instruction is the field
to be pushed. In this case the field to be pushed is
"#2 <Field java.io.PrintStream out
>." If you examined the
bytecode directly, you would see that the field information is not
embedded directly in the instruction. Instead, like all constants
used by a Java class, the field information is stored in a shared
pool. Storing field information in a constant pool reduces the
size of the bytecode instructions. This is because the
instructions only have to store the integer index into the
constant pool instead of the entire constant. In this example,
the field information is at location #2 in the constant pool.
The order of items in the constant pool is compiler dependent,
so you might see a number other than '#2.'
After analyzing the first instruction, it's easy to guess the
meaning of the other instructions. The 'ldc'
(load constant)
instruction pushes the constant "Hello, World." onto the operand
stack. The 'invokevirtual'
invokes the println
method, which pops its two arguments from the operand stack. Don't forget that an instance method such as println
has two arguments: the obvious string argument, plus the implicit 'this'
reference.
HOW BYTECODE PROTECTS YOU
FROM MEMORY BUGS
The Java programming language is frequently touted as a "secure"
language for internet software. Given that the code looks so
much like C++ on the surface, where does this security come
from? It turns out that an important aspect of security is the
prevention of memory-related bugs. Computer criminals exploit
memory bugs to sneak malicious code into otherwise safe programs.
Java bytecode is a first line of defense against this sort of
attack, as the following example demonstrates:
public float add(float f, int n) {
return f + n;
}
If you add this function to the previous example, recompile it, and
run javap
, you should see bytecode similar to this:
Method float add(float, int)
0 fload_1
1 iload_2
2 i2f
3 fadd
4 freturn
At the beginning of a Java method, the virtual machine places
method parameters in a data structure called the local variable
table. As its name suggests, the local variable table also
contains any local variables that you declare. In this example,
the method begins with three local variable table entries, these
are for the three arguments to the add method. Slot 0 holds the
this reference, while slots 1 and 2 hold the float and int
arguments, respectively.
In order to actually manipulate the variables, they must be loaded
(pushed) onto the operand stack. The first instruction,
fload_1
, pushes the float at slot 1 onto the operand stack.
The second instruction, iload_2
, pushes the int at slot
2 onto the operand stack. The interesting thing about these instructions
is in the 'i'
and 'f'
prefixes, which illustrate
that Java bytecode instructions are strongly typed. If the type of an argument does not match the type of the bytecode, the VM will reject the bytecode as
unsafe. Better still, the bytecodes are designed so that these type-safety
checks need only be performed once, at class load time.
How does this type-safety enhance security? If an attacker could
trick the virtual machine into treating an int as a float, or vice
versa, it would be easy to corrupt calculations in a predictable
way. If these calculations involved bank balances, the security
implications would be obvious. More dangerous still would be
tricking the VM into treating an int as an Object reference. In
most scenarios, this would crash the VM, but an attacker needs to
find only one loophole. And don't forget that the attacker doesn't
have to search by hand--it would be pretty easy to write a program
that generated billions of permutations of bad byte codes, trying
to find the lucky one that compromised the VM.
Another case where bytecode safeguards memory is array
manipulation. The 'aastore'
and 'aaload'
bytecodes operate on Java arrays, and they always check array bounds.
These bytecodes throw an ArrayIndexOutOfBoundsException
if
the caller passes the end of the array. Perhaps the most important checks
of all apply to the branching instructions, for example, the bytecodes that
begin with 'if.'
In bytecode, branching instructions can only
branch to another instruction within the same method. The only
way to transfer control outside a method is to return, throw an
exception, or execute one of the 'invoke'
instructions. Not only
does this close the door on many attacks, it also prevents nasty
bugs caused by dangling references or stack corruption. If you have
ever had a system debugger open your program to a random location
in code, you're familiar with these bugs.
The critical point to remember about all of these checks is that
they are made by the virtual machine at the bytecode level, not
just by the compiler. A compiler for a language such as C++ might
prevent some of the memory errors discussed above, but its
protection applies only at the source code level. Operating
systems will happily load and execute any machine code, whether
the code was generated by a careful C++ compiler or a malicious
attacker. In short, C++ is object-oriented only at the source code
level, however Java's object-oriented features extend down to the
compiled code.
ANALYZING BYTECODE TO
IMPROVE YOUR CODE
The memory and security protections of Java bytecode are there for
you whether you notice them or not, so why bother looking at the
bytecode? In many cases, knowing how the compiler translates your
code into bytecode can help you write more efficient code, and can
sometimes even prevent insidious bugs. Consider the
following example:
//return the concatenation str1+str2
String concat(String str1, String str2) {
return str1 + str2;
}
//append str2 to str1
void concat(StringBuffer str1, String str2) {
str1.append(str2);
}
Try to guess how many function calls each method requires to
execute. Now compile the methods and run javap
. You should see
output like this:
Method java.lang.String concat1(java.lang.String, java.lang.String)
0 new #5
3 dup
4 invokespecial #6
7 aload_1
8 invokevirtual #7
11 aload_2
12 invokevirtual #7
15 invokevirtual #8
18 areturn
Method void concat2(java.lang.StringBuffer, java.lang.String)
0 aload_1
1 aload_2
2 invokevirtual #7
5 pop
6 return
The concat1
method makes five method calls: new
,
invokespecial
, and three invokevirtuals
.
That is quite a bit more work than the concat2
method,
which makes only a single invokevirtual
call. Most
Java programmers have been warned that because Strings are
immutable it is more efficient to use StringBuffers for
concatenation. Using javap
to analyze this makes the point in
dramatic fashion. If you are unsure whether two language
constructs are equivalent in performance, you should use javap
to analyze the bytecode. Beware of the just-in-time (JIT)
compiler, though. Because the JIT compiler recompiles the
bytecodes into native machine code, it can apply additional
optimizations that your javap
analysis does not reveal. Unless
you have the source code for your virtual machine, you need to
supplement your bytecode analysis with performance benchmarks.
A final example illustrates how examining bytecode can help
prevent bugs in your application. Create two classes as
follows. Make sure they are in separate files.
public class ChangeALot {
public static final boolean debug=false;
public static boolean log=false;
}
public class EternallyConstant {
public static void main(String [] args) {
System.out.println("EternallyConstant beginning execution");
if (ChangeALot.debug)
System.out.println("Debug mode is on");
if (ChangeALot.log)
System.out.println("Logging mode is on");
}
}
If you run the class EternallyConstant
you should get the message:
EternallyConstant beginning execution.
Now try editing the ChangeALot
file, modifying the debug and log
variables to both be true. Recompile only the ChangeALot
file.
Run EternallyConstant
again, and you will see the following
output:
EternallyConstant beginning execution
Logging mode is on
What happened to the debugging mode? Even though you set debug to
true, the message "Debug mode is on" didn't appear. The answer is
in the bytecode. Run javap
on the EternallyConstant
class, and you will see this:
Method void main(java.lang.String[])
0 getstatic #2
3 ldc #3
5 invokevirtual #4
8 getstatic #5
11 ifeq 22
14 getstatic #2
17 ldc #6
19 invokevirtual #4
22 return
Surprise! While there is an 'ifeq'
check on the log field, the
code does not check the debug field at all. Because the debug
field was marked final, the compiler knew that the debug field
could never change at runtime. Therefore, it optimized the 'if'
statement branch by removing it. This is a very useful
optimization indeed, because it allows you to embed debugging
code in your application and pay no runtime penalty when the
switch is set to false. Unfortunately, this optimization can
lead to major compile-time confusion. If you change a final field,
you have to remember to recompile any other class that might
reference the field. That's because the 'reference'
might have
been optimized away. Java development environments do not always
detect this subtle dependency, something that can lead to very
odd bugs. So, the old C++ adage is still true for the Java
environment. "When in doubt, rebuild all."
Knowing a little bytecode is a valuable assist to any programmer
coding in the Java programming language. The javap
tool makes it
easy to view bytecodes. Occasionally checking your code with javap
can be invaluable in improving performance and catching
particularly elusive bugs. There is substantially more complexity to bytecode and the VM than this tip can cover. To learn more, read Inside the Java
Virtual Machine by Bill Venners.
* As used in this document, the terms "Java virtual machine"
or "JVM" mean a virtual machine for the Java platform.
Note
The names on the JDCSM
mailing list are used for internal Sun MicrosystemsTMpurposes only. To remove your name from the list,
see Subscribe/Unsubscribe below.
Feedback
Comments? Send your feedback on the JDC Tech Tips to: jdc-webmaster
Subscribe/Unsubscribe
The JDC Tech Tips are sent to you because you elected to subscribe.
To unsubscribe from this and any other JDC Email, select
"Subscribe to free JDC newsletters" on the JDC front page. This
displays the Subscriptions page, where you can change the current
selections.
You need to be a JDC member to subscribe to the Tech Tips. To
become a JDC member, go to:
http://java.sun.com/jdc/
To subscribe to the Tech Tips and other JDC Email, select
"Subscribe to free JDC newsletters" on the JDC front page.
Archives
You'll find the JDC Tech Tips archives at:
http://developer.java.sun.com/developer/TechTips/index.html
Copyright
Copyright 2000 Sun Microsystems, Inc. All rights reserved.
901 San Antonio Road, Palo Alto, California 94303 USA.
This Document is protected by copyright. For more information, see:
http://developer.java.sun.com/developer/copyright.html