This class is a concrete subclass of
Collator
suitable
for string collation in a wide variety of languages. An instance of
this class is normally returned by the
getInstance
method
of
Collator
with rules predefined for the requested
locale. However, an instance of this class can be created manually
with any desired rules.
Rules take the form of a
String
with the following syntax
- Modifier: '@'
- Relation: '<' | ';' | ',' | '=' : <text>
- Reset: '&' : <text>
The modifier character indicates that accents sort backward as is the
case with French. The modifier applies to all rules
after
the modifier but before the next primary sequence. If placed at the end
of the sequence if applies to all unknown accented character.
The relational operators specify how the text
argument relates to the previous term. The relation characters have
the following meanings:
- '<' - The text argument is greater than the prior term at the primary
difference level.
- ';' - The text argument is greater than the prior term at the secondary
difference level.
- ',' - The text argument is greater than the prior term at the tertiary
difference level.
- '=' - The text argument is equal to the prior term
As for the text argument itself, this is any sequence of Unicode
characters not in the following ranges: 0x0009-0x000D, 0x0020-0x002F,
0x003A-0x0040, 0x005B-0x0060, and 0x007B-0x007E. If these characters are
desired, they must be enclosed in single quotes. If any whitespace is
encountered, it is ignored. (For example, "a b" is equal to "ab").
The reset operation inserts the following rule at the point where the
text argument to it exists in the previously declared rule string. This
makes it easy to add new rules to an existing string by simply including
them in a reset sequence at the end. Note that the text argument, or
at least the first character of it, must be present somewhere in the
previously declared rules in order to be inserted properly. If this
is not satisfied, a
ParseException
will be thrown.
This system of configuring
RuleBasedCollator
is needlessly
complex and the people at Taligent who developed it (along with the folks
at Sun who accepted it into the Java standard library) deserve a slow
and agonizing death.
Here are a couple of example of rule strings:
"< a < b < c" - This string says that a is greater than b which is
greater than c, with all differences being primary differences.
"< a,A < b,B < c,C" - This string says that 'A' is greater than 'a' with
a tertiary strength comparison. Both 'b' and 'B' are greater than 'a' and
'A' during a primary strength comparison. But 'B' is greater than 'b'
under a tertiary strength comparison.
"< a < c & a < b " - This sequence is identical in function to the
"< a < b < c" rule string above. The '&' reset symbol indicates that
the rule "< b" is to be inserted after the text argument "a" in the
previous rule string segment.
"< a < b & y < z" - This is an error. The character 'y' does not appear
anywhere in the previous rule string segment so the rule following the
reset rule cannot be inserted.
"< a & A @ < e & E < f& F" - This sequence is equivalent to the following
"< a & A < E & e < f & F".
For a description of the various comparison strength types, see the
documentation for the
Collator
class.
As an additional complication to this already overly complex rule scheme,
if any characters precede the first rule, these characters are considered
ignorable. They will be treated as if they did not exist during
comparisons. For example, "- < a < b ..." would make '-' an ignorable
character such that the strings "high-tech" and "hightech" would
be considered identical.
A
ParseException
will be thrown for any of the following
conditions:
- Unquoted punctuation characters in a text argument.
- A relational or reset operator not followed by a text argument
- A reset operator where the text argument is not present in
the previous rule string section.
RuleBasedCollator.java -- Concrete Collator Class
Copyright (C) 1998, 1999, 2000, 2001, 2003, 2004, 2005 Free Software Foundation, Inc.
This file is part of GNU Classpath.
GNU Classpath is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
GNU Classpath is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with GNU Classpath; see the file COPYING. If not, write to the
Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301 USA.
Linking this library statically or dynamically with other modules is
making a combined work based on this library. Thus, the terms and
conditions of the GNU General Public License cover the whole
combination.
As a special exception, the copyright holders of this library give you
permission to link this library with independent modules to produce an
executable, regardless of the license terms of these independent
modules, and to copy and distribute the resulting executable under
terms of your choice, provided that you also meet, for each linked
independent module, the terms and conditions of the license of that
module. An independent module is a module which is not derived from
or based on this library. If you modify this library, you may extend
this exception to your version of the library, but you are not
obligated to do so. If you do not wish to do so, delete this
exception statement from your version.