WELCOME to the Java Developer ConnectionSM Tech Tips,
Vol. 2 No. 8. This issue covers:
Collators
If you've used Java much at all, you've probably had occasion to compare
strings using the String.compareTo method. This method does
lexicographical comparison, which is a fancy way of saying that the
numerical values of corresponding Unicode characters in the strings are
compared. For example, the letter "a" has a numeric value of 0x61, "b"
0x62, and so on.
Such comparisons are obviously useful, but not necessarily completely
adequate, for example in an internationalization context. Suppose, for
example, that you'd like for lower and upper case characters to compare
identical, or you want accents on letters to be ignored. The collator
classes in java.text can be used for this purpose, that is, to build
locale-sensitive string comparison methods.
To see how collators work, consider an example:
import java.text.*;
import java.util.*;
public class collate {
public static void main(String args[])
{
Collator coll = Collator.getInstance(Locale.US);
coll.setStrength(Collator.TERTIARY);
System.out.println(coll.compare("a","A"));//false
coll.setStrength(Collator.SECONDARY);
System.out.println(coll.compare("a","A"));//true
coll.setStrength(Collator.SECONDARY);
System.out.println(coll.compare("a","\u00e0"));//false
coll.setStrength(Collator.PRIMARY);
System.out.println(coll.compare("a","\u00e0"));//true
coll.setStrength(Collator.IDENTICAL);
System.out.println(coll.compare("a","b"));//false
CollationKey key1 = coll.getCollationKey("abc");
CollationKey key2 = coll.getCollationKey("def");
System.out.println(key1.compareTo(key2));//false
}
}
The first line:
Collator defcoll = Collator.getInstance(Locale.US);
retrieves a new collator, according to the locale settings applicable to
the United States.
Then a series of string comparisons is done, in each case setting a
strength before performing the comparison. A strength specifies what
level of difference is considered important in the comparison. Four
different strengths can be defined: IDENTICAL, PRIMARY, SECONDARY, and
TERTIARY. The meaning of each strength depends on the specific locale.
For example, in the US locale, upper versus lower case is considered a
TERTIARY difference, less important than a SECONDARY difference. If the
strength is set to TERTIARY, then case is significant. An example of a
SECONDARY difference is accents on letters. The Unicode letter "\u00e0",
defined to be:
00E0;LATIN SMALL LETTER A WITH GRAVE
is considered different than "a" when comparing using a SECONDARY
strength setting, but identical when using a PRIMARY one. These rules
may of course be different for some other locale.
A final point about this example concerns efficiency. If you are
performing repeated string comparisons using collators, it may be more
efficient to use CollationKey objects instead of Collation.compare.
CollationKey objects are precompiled, which aids performance.
If you are developing applications that operate in an international
context, then this whole area is one that needs to be considered.

BigDecimal
The java.math package contains two classes, BigInteger
and BigDecimal . BigInteger represents
arbitrary-precision integers, with arithmetic operations such as addition and
division supported, along with comparison and hashing methods. A
BigDecimal consists of an arbitrary-precision integer along with
a scale, where the scale is the number of digits to the right of the decimal
point.
These classes can be used in applications requiring high-precision
numbers. Financial applications sometimes require such precision, as do
some kinds of numerical programming problems. An example of one of
these is computing numerical constants to a high degree of precision.
The mathematical constant "e" can be defined as the sum of the
infinite series:
1/0! + 1/1! + 1/2! + 1/3! + ...
A program that uses BigDecimal to compute this constant to 40 places is:
import java.math.*;
public class bige {
public static void main(String args[])
{
BigDecimal one = new BigDecimal("1");
BigDecimal curfact = new BigDecimal("1");
BigDecimal factmul = new BigDecimal("1");
BigDecimal curval = new BigDecimal("0");
String curout = "";
// number of desired decimal places
final int NP = 40;
for (;;) {
// divide 1 by the current factorial
BigDecimal x = one.divide(curfact, NP + 1,
BigDecimal.ROUND_HALF_EVEN);
// add the result to the accumulated value
curval = curval.add(x);
// move to the next factorial value
curfact = curfact.multiply(factmul);
factmul = factmul.add(one);
// check convergence of the current value
String s =
curval.toString().substring(0, NP + 2);
if (s.equals(curout)) {
System.out.println(s);
break;
}
curout = s;
}
}
}
During the calculation, an extra digit is carried to help the rounding
behavior. The rounding behavior itself is ROUND_HALF_EVEN, which means
"round up/down toward nearest digit", or if the digits are equidistant,
round toward the even digit. For example, division using this rounding
mode, assuming one decimal place, comes out like this:
755 / 100 = 7.6 (7.5 and 7.6 are equidistant,
round toward 6)
745 / 100 = 7.4 (7.4 and 7.5 are equidistant,
round toward 4)
This particular rounding method minimizes cumulative error.
The output of the program is:
2.7182818284590452353602874713526624977572
which is a correct value for "e" to 40 places.
The JDC Tech Tips are written by Glen McCluskey.
|