src/java.base/share/classes/java/text/RuleBasedCollator.java
changeset 58288 48e480e56aad
parent 58242 94bb65cb37d3
child 58679 9c3209ff7550
equal deleted inserted replaced
58287:a7f16447085e 58288:48e480e56aad
    41 import java.text.Normalizer;
    41 import java.text.Normalizer;
    42 import java.util.Vector;
    42 import java.util.Vector;
    43 import java.util.Locale;
    43 import java.util.Locale;
    44 
    44 
    45 /**
    45 /**
    46  * The <code>RuleBasedCollator</code> class is a concrete subclass of
    46  * The {@code RuleBasedCollator} class is a concrete subclass of
    47  * <code>Collator</code> that provides a simple, data-driven, table
    47  * {@code Collator} that provides a simple, data-driven, table
    48  * collator.  With this class you can create a customized table-based
    48  * collator.  With this class you can create a customized table-based
    49  * <code>Collator</code>.  <code>RuleBasedCollator</code> maps
    49  * {@code Collator}.  {@code RuleBasedCollator} maps
    50  * characters to sort keys.
    50  * characters to sort keys.
    51  *
    51  *
    52  * <p>
    52  * <p>
    53  * <code>RuleBasedCollator</code> has the following restrictions
    53  * {@code RuleBasedCollator} has the following restrictions
    54  * for efficiency (other subclasses may be used for more complex languages) :
    54  * for efficiency (other subclasses may be used for more complex languages) :
    55  * <ol>
    55  * <ol>
    56  * <li>If a special collation rule controlled by a &lt;modifier&gt; is
    56  * <li>If a special collation rule controlled by a &lt;modifier&gt; is
    57       specified it applies to the whole collator object.
    57       specified it applies to the whole collator object.
    58  * <li>All non-mentioned characters are at the end of the
    58  * <li>All non-mentioned characters are at the end of the
    73  *        characters, excluding special characters (that is, common
    73  *        characters, excluding special characters (that is, common
    74  *        whitespace characters [0009-000D, 0020] and rule syntax characters
    74  *        whitespace characters [0009-000D, 0020] and rule syntax characters
    75  *        [0021-002F, 003A-0040, 005B-0060, 007B-007E]). If those
    75  *        [0021-002F, 003A-0040, 005B-0060, 007B-007E]). If those
    76  *        characters are desired, you can put them in single quotes
    76  *        characters are desired, you can put them in single quotes
    77  *        (e.g. ampersand =&gt; '&amp;'). Note that unquoted white space characters
    77  *        (e.g. ampersand =&gt; '&amp;'). Note that unquoted white space characters
    78  *        are ignored; e.g. <code>b c</code> is treated as <code>bc</code>.
    78  *        are ignored; e.g. {@code b c} is treated as {@code bc}.
    79  *    <LI><strong>Modifier</strong>: There are currently two modifiers that
    79  *    <LI><strong>Modifier</strong>: There are currently two modifiers that
    80  *        turn on special collation rules.
    80  *        turn on special collation rules.
    81  *        <UL>
    81  *        <UL>
    82  *            <LI>'@' : Turns on backwards sorting of accents (secondary
    82  *            <LI>'@' : Turns on backwards sorting of accents (secondary
    83  *                      differences), as in French.
    83  *                      differences), as in French.
   144  * "black-birds". In the samples for different languages, you see that most
   144  * "black-birds". In the samples for different languages, you see that most
   145  * accents are ignorable.
   145  * accents are ignorable.
   146  *
   146  *
   147  * <p><strong>Normalization and Accents</strong>
   147  * <p><strong>Normalization and Accents</strong>
   148  * <p>
   148  * <p>
   149  * <code>RuleBasedCollator</code> automatically processes its rule table to
   149  * {@code RuleBasedCollator} automatically processes its rule table to
   150  * include both pre-composed and combining-character versions of
   150  * include both pre-composed and combining-character versions of
   151  * accented characters.  Even if the provided rule string contains only
   151  * accented characters.  Even if the provided rule string contains only
   152  * base characters and separate combining accent characters, the pre-composed
   152  * base characters and separate combining accent characters, the pre-composed
   153  * accented characters matching all canonical combinations of characters from
   153  * accented characters matching all canonical combinations of characters from
   154  * the rule string will be entered in the table.
   154  * the rule string will be entered in the table.
   173  *        (e.g. "a &lt; ,b").
   173  *        (e.g. "a &lt; ,b").
   174  *     <LI>A reset where the text-argument (or an initial substring of the
   174  *     <LI>A reset where the text-argument (or an initial substring of the
   175  *         text-argument) is not already in the sequence.
   175  *         text-argument) is not already in the sequence.
   176  *         (e.g. "a &lt; b &amp; e &lt; f")
   176  *         (e.g. "a &lt; b &amp; e &lt; f")
   177  * </UL>
   177  * </UL>
   178  * If you produce one of these errors, a <code>RuleBasedCollator</code> throws
   178  * If you produce one of these errors, a {@code RuleBasedCollator} throws
   179  * a <code>ParseException</code>.
   179  * a {@code ParseException}.
   180  *
   180  *
   181  * <p><strong>Examples</strong>
   181  * <p><strong>Examples</strong>
   182  * <p>Simple:     "&lt; a &lt; b &lt; c &lt; d"
   182  * <p>Simple:     "&lt; a &lt; b &lt; c &lt; d"
   183  * <p>Norwegian:  "&lt; a, A &lt; b, B &lt; c, C &lt; d, D &lt; e, E &lt; f, F
   183  * <p>Norwegian:  "&lt; a, A &lt; b, B &lt; c, C &lt; d, D &lt; e, E &lt; f, F
   184  *                 &lt; g, G &lt; h, H &lt; i, I &lt; j, J &lt; k, K &lt; l, L
   184  *                 &lt; g, G &lt; h, H &lt; i, I &lt; j, J &lt; k, K &lt; l, L
   189  *                 &lt; &#92;u00F8, &#92;u00D8
   189  *                 &lt; &#92;u00F8, &#92;u00D8
   190  *                 &lt; &#92;u00E5 = a&#92;u030A, &#92;u00C5 = A&#92;u030A;
   190  *                 &lt; &#92;u00E5 = a&#92;u030A, &#92;u00C5 = A&#92;u030A;
   191  *                      aa, AA"
   191  *                      aa, AA"
   192  *
   192  *
   193  * <p>
   193  * <p>
   194  * To create a <code>RuleBasedCollator</code> object with specialized
   194  * To create a {@code RuleBasedCollator} object with specialized
   195  * rules tailored to your needs, you construct the <code>RuleBasedCollator</code>
   195  * rules tailored to your needs, you construct the {@code RuleBasedCollator}
   196  * with the rules contained in a <code>String</code> object. For example:
   196  * with the rules contained in a {@code String} object. For example:
   197  * <blockquote>
   197  * <blockquote>
   198  * <pre>
   198  * <pre>
   199  * String simple = "&lt; a&lt; b&lt; c&lt; d";
   199  * String simple = "&lt; a&lt; b&lt; c&lt; d";
   200  * RuleBasedCollator mySimple = new RuleBasedCollator(simple);
   200  * RuleBasedCollator mySimple = new RuleBasedCollator(simple);
   201  * </pre>
   201  * </pre>
   216  * </blockquote>
   216  * </blockquote>
   217  *
   217  *
   218  * <p>
   218  * <p>
   219  * A new collation rules string can be created by concatenating rules
   219  * A new collation rules string can be created by concatenating rules
   220  * strings. For example, the rules returned by {@link #getRules()} could
   220  * strings. For example, the rules returned by {@link #getRules()} could
   221  * be concatenated to combine multiple <code>RuleBasedCollator</code>s.
   221  * be concatenated to combine multiple {@code RuleBasedCollator}s.
   222  *
   222  *
   223  * <p>
   223  * <p>
   224  * The following example demonstrates how to change the order of
   224  * The following example demonstrates how to change the order of
   225  * non-spacing accents,
   225  * non-spacing accents,
   226  * <blockquote>
   226  * <blockquote>
   348      * Compares the character data stored in two different strings based on the
   348      * Compares the character data stored in two different strings based on the
   349      * collation rules.  Returns information about whether a string is less
   349      * collation rules.  Returns information about whether a string is less
   350      * than, greater than or equal to another string in a language.
   350      * than, greater than or equal to another string in a language.
   351      * This can be overridden in a subclass.
   351      * This can be overridden in a subclass.
   352      *
   352      *
   353      * @throws    NullPointerException if <code>source</code> or <code>target</code> is null.
   353      * @throws    NullPointerException if {@code source} or {@code target} is null.
   354      */
   354      */
   355     public synchronized int compare(String source, String target)
   355     public synchronized int compare(String source, String target)
   356     {
   356     {
   357         if (source == null || target == null) {
   357         if (source == null || target == null) {
   358             throw new NullPointerException();
   358             throw new NullPointerException();