This issue presents tips, techniques, and sample code for the
following topics:
Converting Pathnames to URLs
A new feature of the JavaTM 2 Platform
is the File.toURL method, which is used to convert a pathname
specification to a URL
(Uniform Resource Locator, used on the Web).
A simple example that illustrates this method is:
import java.io.*;
import java.net.*;
public class url {
public static void main(String args[])
{
if (args.length != 1) {
System.err.println("missing filename");
System.exit(1);
}
File f = new File(args[0]);
try {
URL u = f.toURL();
System.out.println(u);
}
catch (MalformedURLException e) {
System.err.println(e);
}
}
}
For input of:
$ java url paper.txt (current directory is t:\tmp)
output is:
file:/T:/tmp/paper.txt
and this URL can be specified to view the local file in Netscape or
Microsoft web browsers.
Such a method is useful in applications that have to treat local
pathnames and web-based resources in a uniform way.
Using Vector in the Collection Framework
Collections are a new feature of the Java 2 Platform, and are described in
detail in various articles available on the Java Developer Connection.
Collections are used to organize and operate on groups of data elements.
For example, ArrayList is a replacement for Vector ,
and HashMap is similar to Hashtable .
The old classes such as Vector are still available, but the new
ones are preferred. So an obvious question is how to convert between old and
new. You might, say, have a Vector object in an application, and
you want to call a method that takes an ArrayList argument. One
way of doing such a conversion is illustrated by the following example:
import java.util.*;
public class convert {
public static void process(ArrayList al)
{
for (int i = 0; i < al.size(); i++)
System.out.println(al.get(i));
}
public static void main(String args[])
{
Vector vec = new Vector();
vec.addElement("123");
vec.addElement(new Integer(456));
vec.addElement(new Double(789));
process(new ArrayList(vec));
}
}
A Vector is created, and several elements added to it. Then the
process method is called, and it is passed an ArrayList object,
one created via a constructor that takes a Vector argument. More
precisely, what is happening here is that there is an ArrayList
constructor that takes a "Collection" interface argument, and
Vector has been retrofitted to implement the
Collection interface, and so an ArrayList can be
created from a Vector via this constructor.
There are a number of other conversion mechanisms available in the
collection framework, for hooking together old and new code.
Reading/Writing Unicode Using I/O Stream Encodings
The Java programming language uses two-byte Unicode characters, while
one-byte characters are common in other languages such as C (which uses
ASCII). An obvious question that comes up is therefore: how are Java
characters stored in disk files, and how can the Java language make use
of the huge quantity of data out there that is encoded in ASCII?
When the JDKTM software, such as version
1.0.2, first became available, this problem hadn't been solved. For example,
DataInputStream.readLine is a method for reading lines of input,
but it fails to properly convert bytes to characters, and is now deprecated.
You won't necessarily notice this failure until you start to more fully use
the Unicode character set.
This problem has been solved by means of the Reader and
Writer I/O classes. These sit on top of a byte stream (such
as FileInputStream ), and apply encoding bytes -> characters
or characters -> bytes.
There's an encoding that is applied by default, and you can determine
its name via a small program:
public class encode {
public static void main(String args[])
{
String p = System.getProperty("file.encoding");
System.out.println(p);
}
}
On my machine, running Java 2 software, this prints out Cp1252,
which is a code for:
Windows Western Europe / Latin-1
A table of encodings can be found at:
http://java.sun.com/products/jdk/ 1.1/intl/html/intlspec.doc7.html
If you want to directly specify encodings, one way of doing so is
illustrated by the following program, which writes all the lower case
letters of the Unicode alphabet to a file. Some of these characters
have a non-zero high byte (that is, they are greater in value than
'\u00ff'), and preserving both bytes of the character is therefore
important. The encoding used is one called UTF-8, which has the property
of representing ASCII text as itself (one byte), and other characters as
two or three bytes.
import java.io.*;
public class enc1 {
public static void main(String args[])
{
try {
FileOutputStream fos =
new FileOutputStream("out");
OutputStreamWriter osw =
new OutputStreamWriter(fos, "UTF8");
for (int c = '\u0000'; c <= '\uffff'; c++) {
if (!Character.isLowerCase((char)c))
continue;
osw.write(c);
}
osw.close();
}
catch (IOException e) {
System.err.println(e);
}
}
}
This program reverses the process:
import java.io.*;
public class enc2 {
public static void main(String args[])
{
try {
FileInputStream fis =
new FileInputStream("out");
InputStreamReader isr =
new InputStreamReader(fis, "UTF8");
for (int c = '\u0000'; c <= '\uffff'; c++) {
if (!Character.isLowerCase((char)c))
continue;
int ch = isr.read();
if (c != ch)
System.err.println("error");
}
isr.close();
}
catch (IOException e) {
System.err.println(e);
}
}
}
InputStreamReader and OutputStreamWriter are the
classes where byte streams are converted to character streams and vice versa.
This issue is quite an important one if you are concerned with writing
applications that operate in an international context.
The JDC Tech Tips are written by Glen McCluskey.
|