41 import java.text.Normalizer; |
41 import java.text.Normalizer; |
42 import java.util.Vector; |
42 import java.util.Vector; |
43 import java.util.Locale; |
43 import java.util.Locale; |
44 |
44 |
45 /** |
45 /** |
46 * The <code>RuleBasedCollator</code> class is a concrete subclass of |
46 * The {@code RuleBasedCollator} class is a concrete subclass of |
47 * <code>Collator</code> that provides a simple, data-driven, table |
47 * {@code Collator} that provides a simple, data-driven, table |
48 * collator. With this class you can create a customized table-based |
48 * collator. With this class you can create a customized table-based |
49 * <code>Collator</code>. <code>RuleBasedCollator</code> maps |
49 * {@code Collator}. {@code RuleBasedCollator} maps |
50 * characters to sort keys. |
50 * characters to sort keys. |
51 * |
51 * |
52 * <p> |
52 * <p> |
53 * <code>RuleBasedCollator</code> has the following restrictions |
53 * {@code RuleBasedCollator} has the following restrictions |
54 * for efficiency (other subclasses may be used for more complex languages) : |
54 * for efficiency (other subclasses may be used for more complex languages) : |
55 * <ol> |
55 * <ol> |
56 * <li>If a special collation rule controlled by a <modifier> is |
56 * <li>If a special collation rule controlled by a <modifier> is |
57 specified it applies to the whole collator object. |
57 specified it applies to the whole collator object. |
58 * <li>All non-mentioned characters are at the end of the |
58 * <li>All non-mentioned characters are at the end of the |
73 * characters, excluding special characters (that is, common |
73 * characters, excluding special characters (that is, common |
74 * whitespace characters [0009-000D, 0020] and rule syntax characters |
74 * whitespace characters [0009-000D, 0020] and rule syntax characters |
75 * [0021-002F, 003A-0040, 005B-0060, 007B-007E]). If those |
75 * [0021-002F, 003A-0040, 005B-0060, 007B-007E]). If those |
76 * characters are desired, you can put them in single quotes |
76 * characters are desired, you can put them in single quotes |
77 * (e.g. ampersand => '&'). Note that unquoted white space characters |
77 * (e.g. ampersand => '&'). Note that unquoted white space characters |
78 * are ignored; e.g. <code>b c</code> is treated as <code>bc</code>. |
78 * are ignored; e.g. {@code b c} is treated as {@code bc}. |
79 * <LI><strong>Modifier</strong>: There are currently two modifiers that |
79 * <LI><strong>Modifier</strong>: There are currently two modifiers that |
80 * turn on special collation rules. |
80 * turn on special collation rules. |
81 * <UL> |
81 * <UL> |
82 * <LI>'@' : Turns on backwards sorting of accents (secondary |
82 * <LI>'@' : Turns on backwards sorting of accents (secondary |
83 * differences), as in French. |
83 * differences), as in French. |
144 * "black-birds". In the samples for different languages, you see that most |
144 * "black-birds". In the samples for different languages, you see that most |
145 * accents are ignorable. |
145 * accents are ignorable. |
146 * |
146 * |
147 * <p><strong>Normalization and Accents</strong> |
147 * <p><strong>Normalization and Accents</strong> |
148 * <p> |
148 * <p> |
149 * <code>RuleBasedCollator</code> automatically processes its rule table to |
149 * {@code RuleBasedCollator} automatically processes its rule table to |
150 * include both pre-composed and combining-character versions of |
150 * include both pre-composed and combining-character versions of |
151 * accented characters. Even if the provided rule string contains only |
151 * accented characters. Even if the provided rule string contains only |
152 * base characters and separate combining accent characters, the pre-composed |
152 * base characters and separate combining accent characters, the pre-composed |
153 * accented characters matching all canonical combinations of characters from |
153 * accented characters matching all canonical combinations of characters from |
154 * the rule string will be entered in the table. |
154 * the rule string will be entered in the table. |
173 * (e.g. "a < ,b"). |
173 * (e.g. "a < ,b"). |
174 * <LI>A reset where the text-argument (or an initial substring of the |
174 * <LI>A reset where the text-argument (or an initial substring of the |
175 * text-argument) is not already in the sequence. |
175 * text-argument) is not already in the sequence. |
176 * (e.g. "a < b & e < f") |
176 * (e.g. "a < b & e < f") |
177 * </UL> |
177 * </UL> |
178 * If you produce one of these errors, a <code>RuleBasedCollator</code> throws |
178 * If you produce one of these errors, a {@code RuleBasedCollator} throws |
179 * a <code>ParseException</code>. |
179 * a {@code ParseException}. |
180 * |
180 * |
181 * <p><strong>Examples</strong> |
181 * <p><strong>Examples</strong> |
182 * <p>Simple: "< a < b < c < d" |
182 * <p>Simple: "< a < b < c < d" |
183 * <p>Norwegian: "< a, A < b, B < c, C < d, D < e, E < f, F |
183 * <p>Norwegian: "< a, A < b, B < c, C < d, D < e, E < f, F |
184 * < g, G < h, H < i, I < j, J < k, K < l, L |
184 * < g, G < h, H < i, I < j, J < k, K < l, L |
189 * < \u00F8, \u00D8 |
189 * < \u00F8, \u00D8 |
190 * < \u00E5 = a\u030A, \u00C5 = A\u030A; |
190 * < \u00E5 = a\u030A, \u00C5 = A\u030A; |
191 * aa, AA" |
191 * aa, AA" |
192 * |
192 * |
193 * <p> |
193 * <p> |
194 * To create a <code>RuleBasedCollator</code> object with specialized |
194 * To create a {@code RuleBasedCollator} object with specialized |
195 * rules tailored to your needs, you construct the <code>RuleBasedCollator</code> |
195 * rules tailored to your needs, you construct the {@code RuleBasedCollator} |
196 * with the rules contained in a <code>String</code> object. For example: |
196 * with the rules contained in a {@code String} object. For example: |
197 * <blockquote> |
197 * <blockquote> |
198 * <pre> |
198 * <pre> |
199 * String simple = "< a< b< c< d"; |
199 * String simple = "< a< b< c< d"; |
200 * RuleBasedCollator mySimple = new RuleBasedCollator(simple); |
200 * RuleBasedCollator mySimple = new RuleBasedCollator(simple); |
201 * </pre> |
201 * </pre> |
216 * </blockquote> |
216 * </blockquote> |
217 * |
217 * |
218 * <p> |
218 * <p> |
219 * A new collation rules string can be created by concatenating rules |
219 * A new collation rules string can be created by concatenating rules |
220 * strings. For example, the rules returned by {@link #getRules()} could |
220 * strings. For example, the rules returned by {@link #getRules()} could |
221 * be concatenated to combine multiple <code>RuleBasedCollator</code>s. |
221 * be concatenated to combine multiple {@code RuleBasedCollator}s. |
222 * |
222 * |
223 * <p> |
223 * <p> |
224 * The following example demonstrates how to change the order of |
224 * The following example demonstrates how to change the order of |
225 * non-spacing accents, |
225 * non-spacing accents, |
226 * <blockquote> |
226 * <blockquote> |
348 * Compares the character data stored in two different strings based on the |
348 * Compares the character data stored in two different strings based on the |
349 * collation rules. Returns information about whether a string is less |
349 * collation rules. Returns information about whether a string is less |
350 * than, greater than or equal to another string in a language. |
350 * than, greater than or equal to another string in a language. |
351 * This can be overridden in a subclass. |
351 * This can be overridden in a subclass. |
352 * |
352 * |
353 * @throws NullPointerException if <code>source</code> or <code>target</code> is null. |
353 * @throws NullPointerException if {@code source} or {@code target} is null. |
354 */ |
354 */ |
355 public synchronized int compare(String source, String target) |
355 public synchronized int compare(String source, String target) |
356 { |
356 { |
357 if (source == null || target == null) { |
357 if (source == null || target == null) { |
358 throw new NullPointerException(); |
358 throw new NullPointerException(); |