|
February 16, 1999 This issue presents tips, techniques, and sample code for the following topics:
Converting Pathnames to URLs
A new feature of the Java 2 Platform
is the A simple example that illustrates this method is:
For input of: $ java url paper.txt (current directory is t:\tmp) output is: file:/T:/tmp/paper.txt and this URL can be specified to view the local file in Netscape or Microsoft web browsers. Such a method is useful in applications that have to treat local pathnames and web-based resources in a uniform way. Using
|
import java.util.*;
public class convert {
public static void process(ArrayList al)
{
for (int i = 0; i < al.size(); i++)
System.out.println(al.get(i));
}
public static void main(String args[])
{
Vector vec = new Vector();
vec.addElement("123");
vec.addElement(new Integer(456));
vec.addElement(new Double(789));
process(new ArrayList(vec));
}
}
|
A Vector is created, and several elements added to it. Then the
process method is called, and it is passed an ArrayList object,
one created via a constructor that takes a Vector argument. More
precisely, what is happening here is that there is an ArrayList
constructor that takes a "Collection" interface argument, and
Vector has been retrofitted to implement the
Collection interface, and so an ArrayList can be
created from a Vector via this constructor.
There are a number of other conversion mechanisms available in the collection framework, for hooking together old and new code.
The Java programming language uses two-byte Unicode characters, while one-byte characters are common in other languages such as C (which uses ASCII). An obvious question that comes up is therefore: how are Java characters stored in disk files, and how can the Java language make use of the huge quantity of data out there that is encoded in ASCII?
When the JDK software, such as version
1.0.2, first became available, this problem hadn't been solved. For example,
DataInputStream.readLine is a method for reading lines of input,
but it fails to properly convert bytes to characters, and is now deprecated.
You won't necessarily notice this failure until you start to more fully use
the Unicode character set.
This problem has been solved by means of the Reader and
Writer I/O classes. These sit on top of a byte stream (such
as FileInputStream), and apply encoding bytes -> characters
or characters -> bytes.
There's an encoding that is applied by default, and you can determine its name via a small program:
public class encode {
public static void main(String args[])
{
String p = System.getProperty("file.encoding");
System.out.println(p);
}
}
|
On my machine, running Java 2 software, this prints out Cp1252, which is a code for:
Windows Western Europe / Latin-1
A table of encodings can be found at:
http://java.sun.com/products/jdk/
1.1/intl/html/intlspec.doc7.html
If you want to directly specify encodings, one way of doing so is illustrated by the following program, which writes all the lower case letters of the Unicode alphabet to a file. Some of these characters have a non-zero high byte (that is, they are greater in value than '\u00ff'), and preserving both bytes of the character is therefore important. The encoding used is one called UTF-8, which has the property of representing ASCII text as itself (one byte), and other characters as two or three bytes.
import java.io.*;
public class enc1 {
public static void main(String args[])
{
try {
FileOutputStream fos =
new FileOutputStream("out");
OutputStreamWriter osw =
new OutputStreamWriter(fos, "UTF8");
for (int c = '\u0000'; c <= '\uffff'; c++) {
if (!Character.isLowerCase((char)c))
continue;
osw.write(c);
}
osw.close();
}
catch (IOException e) {
System.err.println(e);
}
}
}
|
This program reverses the process:
import java.io.*;
public class enc2 {
public static void main(String args[])
{
try {
FileInputStream fis =
new FileInputStream("out");
InputStreamReader isr =
new InputStreamReader(fis, "UTF8");
for (int c = '\u0000'; c <= '\uffff'; c++) {
if (!Character.isLowerCase((char)c))
continue;
int ch = isr.read();
if (c != ch)
System.err.println("error");
}
isr.close();
}
catch (IOException e) {
System.err.println(e);
}
}
}
|
InputStreamReader and OutputStreamWriter are the
classes where byte streams are converted to character streams and vice versa.
This issue is quite an important one if you are concerned with writing applications that operate in an international context.
The JDC Tech Tips are written by Glen McCluskey.
|
| ||||||||||||