Java Technology Home Page
A-Z Index

Java Developer Connection(SM)
Technical Tips

Downloads, APIs, Documentation
Java Developer Connection
Tutorials, Tech Articles, Training
Online Support
Community Discussion
News & Events from Everywhere
Products from Everywhere
How Java Technology is Used Worldwide
Print Button
 
Tech Tips Index

Tech Tips
February 16, 1999

This issue presents tips, techniques, and sample code for the following topics:

Converting Pathnames to URLs

A new feature of the JavaTM 2 Platform is the File.toURL method, which is used to convert a pathname specification to a URL (Uniform Resource Locator, used on the Web).

A simple example that illustrates this method is:

  
import java.io.*;
import java.net.*;
  
public class url {
  public static void main(String args[])
  {
    if (args.length != 1) {
      System.err.println("missing filename");
      System.exit(1);
    }
    File f = new File(args[0]);
    try {
      URL u = f.toURL();
      System.out.println(u);
    }
    catch (MalformedURLException e) {
      System.err.println(e);
    }
  }
}
For input of:
  
$ java url paper.txt    (current directory is t:\tmp)
output is:
  
file:/T:/tmp/paper.txt
and this URL can be specified to view the local file in Netscape or Microsoft web browsers.

Such a method is useful in applications that have to treat local pathnames and web-based resources in a uniform way.

chiclets

Using Vector in the Collection Framework

Collections are a new feature of the Java 2 Platform, and are described in detail in various articles available on the Java Developer Connection. Collections are used to organize and operate on groups of data elements. For example, ArrayList is a replacement for Vector, and HashMap is similar to Hashtable.

The old classes such as Vector are still available, but the new ones are preferred. So an obvious question is how to convert between old and new. You might, say, have a Vector object in an application, and you want to call a method that takes an ArrayList argument. One way of doing such a conversion is illustrated by the following example:

  
import java.util.*;
  
public class convert {
  public static void process(ArrayList al)
  {
    for (int i = 0; i < al.size(); i++)
      System.out.println(al.get(i));
  }
  
  public static void main(String args[])
  {
    Vector vec = new Vector();
  
    vec.addElement("123");
    vec.addElement(new Integer(456));
    vec.addElement(new Double(789));
  
    process(new ArrayList(vec));
  }
} 
A Vector is created, and several elements added to it. Then the process method is called, and it is passed an ArrayList object, one created via a constructor that takes a Vector argument. More precisely, what is happening here is that there is an ArrayList constructor that takes a "Collection" interface argument, and Vector has been retrofitted to implement the Collection interface, and so an ArrayList can be created from a Vector via this constructor.

There are a number of other conversion mechanisms available in the collection framework, for hooking together old and new code.

chiclets

Reading/Writing Unicode Using I/O Stream Encodings

The Java programming language uses two-byte Unicode characters, while one-byte characters are common in other languages such as C (which uses ASCII). An obvious question that comes up is therefore: how are Java characters stored in disk files, and how can the Java language make use of the huge quantity of data out there that is encoded in ASCII?

When the JDKTM software, such as version 1.0.2, first became available, this problem hadn't been solved. For example, DataInputStream.readLine is a method for reading lines of input, but it fails to properly convert bytes to characters, and is now deprecated. You won't necessarily notice this failure until you start to more fully use the Unicode character set.

This problem has been solved by means of the Reader and Writer I/O classes. These sit on top of a byte stream (such as FileInputStream), and apply encoding bytes -> characters or characters -> bytes.

There's an encoding that is applied by default, and you can determine its name via a small program:

  
public class encode {
  public static void main(String args[])
  {
    String p = System.getProperty("file.encoding");
    System.out.println(p);
  }
}
On my machine, running Java 2 software, this prints out Cp1252, which is a code for:
  
Windows Western Europe / Latin-1
A table of encodings can be found at:
http://java.sun.com/products/jdk/
1.1/intl/html/intlspec.doc7.html

If you want to directly specify encodings, one way of doing so is illustrated by the following program, which writes all the lower case letters of the Unicode alphabet to a file. Some of these characters have a non-zero high byte (that is, they are greater in value than '\u00ff'), and preserving both bytes of the character is therefore important. The encoding used is one called UTF-8, which has the property of representing ASCII text as itself (one byte), and other characters as two or three bytes.

  
import java.io.*;
  
public class enc1 {
  public static void main(String args[])
  {
    try {
      FileOutputStream fos =
          new FileOutputStream("out");
      OutputStreamWriter osw =
          new OutputStreamWriter(fos, "UTF8");
      for (int c = '\u0000'; c <= '\uffff'; c++) {
        if (!Character.isLowerCase((char)c))
          continue;
        osw.write(c);
      }
      osw.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }
  }
}
This program reverses the process:
  
import java.io.*;
  
public class enc2 {
  public static void main(String args[])
  {
    try {
      FileInputStream fis =
          new FileInputStream("out");
      InputStreamReader isr =
          new InputStreamReader(fis, "UTF8");
      for (int c = '\u0000'; c <= '\uffff'; c++) {
        if (!Character.isLowerCase((char)c))
          continue;
        int ch = isr.read();
        if (c != ch)
          System.err.println("error");
      }
      isr.close();
    }
    catch (IOException e) {
      System.err.println(e);
    }
  }
}
InputStreamReader and OutputStreamWriter are the classes where byte streams are converted to character streams and vice versa.

This issue is quite an important one if you are concerned with writing applications that operate in an international context.

coffeecup

The JDC Tech Tips are written by Glen McCluskey.


Print Button
[ This page was updated: 21-Sep-2000 ]
Products & APIs | Developer Connection | Docs & Training | Online Support
Community Discussion | Industry News | Solutions Marketplace | Case Studies
Glossary | Feedback | A-Z Index
For more information on Java technology
and other software from Sun Microsystems, call:
(800) 786-7638
Outside the U.S. and Canada, dial your country's AT&T Direct Access Number first.
Sun Microsystems, Inc.
Copyright © 1995-2000 Sun Microsystems, Inc.
All Rights Reserved. Terms of Use. Privacy Policy.