This issue presents tips, techniques, and sample code for the following topics:
Performance -- using Object to represent disparate types.
This tip is a little tricky, but it recently came up in an actual application,
and illustrates how Java language features are used to efficiently represent a
large data structure.
The application is one where a very large tree structure, consuming
millions of bytes, is built up. Some of the nodes in the tree reference
child nodes (non-terminals), while others are leaf nodes (terminals) and
have no children, but contain String information. The application involves
parsing a large Java program and representing it internally via a tree.
One simple approach to this problem is to define a Node class such as the
following:
public class Node {
private int type;
private Node child[];
private String info;
}
If the node is a leaf node, then info is used. Otherwise, child refers to
the children of the node, and child.length to the number of children.
This approach works pretty well, but uses a lot of memory. Only one of
child and info are used at any one time, meaning that the other field is
wasted. child is an array, with attendant overhead, for example, in
storing the dimensions of the array for subscript checking. For certain
large inputs, the parser program runs out of memory.
The first refinement of this approach is to collapse child and info:
public class Node {
private int type;
private Object info;
}
In this scheme, info can refer to either a String, for a leaf node, or to a
child node array. Object is the root of the Java class hierarchy, so that
for example, the following:
class A {}
implicitly means:
class A extends Object {}
An instance of a subclass of Object, such as String, can be assigned to an
Object reference. An array of Nodes can likewise be assigned to an Object.
The instanceof operator can be used to determine the actual type of an
Object reference.
In the parser application, using Object to represent both data types is not
good enough because it still takes up too much memory. So a further change
has been implemented. After doing some research, it was found that the
child array consisted of a single Node element about 95 percent of the
time. So it's possible to represent one-child cases directly using an
Object reference to the child node, rather than a reference to a one-long
array of child nodes.
This representation is complicated, and it's useful to define a method for
encapsulating the abstraction as in the following example:
public class Node {
private int type;
private Object info;
// constructors, other methods here ...
// gets the i-th child reference
public Node getChild(int i)
{
if (info instanceof String)
return null;
else if (info instanceof Node && i == 0)
return (Node)info;
else
return ((Node[])info)[i];
}
}
getChild returns the i-th child, or null for leaf nodes. If there is
exactly one child, then info is of type Node, referencing that child. If
there is more than one child, info is of type Node[], and a cast to Node[]
is done, followed by a retrieval and return of the child reference.
In the parser application, this change is enough to tip the scales, so that
the application would not run out of memory. The internal representation
in this example is tricky, but it can be hidden via methods such as
getChild. In general, it's wise to avoid tricky coding, but useful to know
how to do it when the need arises.
The example also illustrates the utility of using one Object reference to
represent several different data types. In C/C++ similar techniques would
use void* pointers or unions.
Please see
Tech Tips: January 20, 1998
for a followup on this topic.
Reflection.
An important Java language feature is reflection, a feature also known as
"introspection." Reflection is the ability to query a Java
class about its properties, and to operate on methods and fields by name
for a given object instance.
You can use reflection to set object fields or invoke particular object
methods by name. For example, given an object instance "obj,"
and a method name "f5" specified at program execution time, the
method can be invoked on the instance.
To see how this works, look at this simple example:
import java.lang.reflect.*;
public class DumpMethods {
public static void main(String args[])
{
try {
Class c = Class.forName(args[0]);
Method m[] = c.getDeclaredMethods();
for (int i = 0; i < m.length; i++)
System.out.println(m[i].toString());
}
catch (Throwable e) {
System.err.println(e);
}
}
}
For an invocation of:
java DumpMethods java.util.Stack
the output is:
public java.lang.Object
java.util.Stack.push(java.lang.Object)
public synchronized
java.lang.Object java.util.Stack.pop()
public synchronized
java.lang.Object java.util.Stack.peek()
public boolean java.util.Stack.empty()
public synchronized int
java.util.Stack.search(java.lang.Object)
That is, the method names of class java.util.Stack are listed, along with
their fully qualified return types and parameter types.
This program loads the specified class using class.forName, and then calls
getDeclaredMethods to retrieve the list of methods defined in the class.
The class java.lang.reflect.Method represents a single class method.
The reflection feature may not seem like much at first, but it's impossible
to do in other languages such as C, C++, or Fortran. The names and
properties of functions in these other languages are gone by the time the
program is executed. In technical terms, these other languages have
"earlybinding," whereas the Java language has "late
binding."
|