29 import java.util.Map; |
29 import java.util.Map; |
30 import java.util.HashMap; |
30 import java.util.HashMap; |
31 import java.util.Locale; |
31 import java.util.Locale; |
32 |
32 |
33 /** |
33 /** |
34 * The <code>Character</code> class wraps a value of the primitive |
34 * The {@code Character} class wraps a value of the primitive |
35 * type <code>char</code> in an object. An object of type |
35 * type {@code char} in an object. An object of type |
36 * <code>Character</code> contains a single field whose type is |
36 * {@code Character} contains a single field whose type is |
37 * <code>char</code>. |
37 * {@code char}. |
38 * <p> |
38 * <p> |
39 * In addition, this class provides several methods for determining |
39 * In addition, this class provides several methods for determining |
40 * a character's category (lowercase letter, digit, etc.) and for converting |
40 * a character's category (lowercase letter, digit, etc.) and for converting |
41 * characters from uppercase to lowercase and vice versa. |
41 * characters from uppercase to lowercase and vice versa. |
42 * <p> |
42 * <p> |
43 * Character information is based on the Unicode Standard, version 6.0.0. |
43 * Character information is based on the Unicode Standard, version 6.0.0. |
44 * <p> |
44 * <p> |
45 * The methods and data of class <code>Character</code> are defined by |
45 * The methods and data of class {@code Character} are defined by |
46 * the information in the <i>UnicodeData</i> file that is part of the |
46 * the information in the <i>UnicodeData</i> file that is part of the |
47 * Unicode Character Database maintained by the Unicode |
47 * Unicode Character Database maintained by the Unicode |
48 * Consortium. This file specifies various properties including name |
48 * Consortium. This file specifies various properties including name |
49 * and general category for every defined Unicode code point or |
49 * and general category for every defined Unicode code point or |
50 * character range. |
50 * character range. |
54 * <li><a href="http://www.unicode.org">http://www.unicode.org</a> |
54 * <li><a href="http://www.unicode.org">http://www.unicode.org</a> |
55 * </ul> |
55 * </ul> |
56 * |
56 * |
57 * <h4><a name="unicode">Unicode Character Representations</a></h4> |
57 * <h4><a name="unicode">Unicode Character Representations</a></h4> |
58 * |
58 * |
59 * <p>The <code>char</code> data type (and therefore the value that a |
59 * <p>The {@code char} data type (and therefore the value that a |
60 * <code>Character</code> object encapsulates) are based on the |
60 * {@code Character} object encapsulates) are based on the |
61 * original Unicode specification, which defined characters as |
61 * original Unicode specification, which defined characters as |
62 * fixed-width 16-bit entities. The Unicode standard has since been |
62 * fixed-width 16-bit entities. The Unicode standard has since been |
63 * changed to allow for characters whose representation requires more |
63 * changed to allow for characters whose representation requires more |
64 * than 16 bits. The range of legal <em>code point</em>s is now |
64 * than 16 bits. The range of legal <em>code point</em>s is now |
65 * U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>. |
65 * U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>. |
70 * |
70 * |
71 * <p><a name="BMP">The set of characters from U+0000 to U+FFFF is |
71 * <p><a name="BMP">The set of characters from U+0000 to U+FFFF is |
72 * sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>. |
72 * sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>. |
73 * <a name="supplementary">Characters</a> whose code points are greater |
73 * <a name="supplementary">Characters</a> whose code points are greater |
74 * than U+FFFF are called <em>supplementary character</em>s. The Java |
74 * than U+FFFF are called <em>supplementary character</em>s. The Java |
75 * platform uses the UTF-16 representation in <code>char</code> arrays and |
75 * platform uses the UTF-16 representation in {@code char} arrays and |
76 * in the <code>String</code> and <code>StringBuffer</code> classes. In |
76 * in the {@code String} and {@code StringBuffer} classes. In |
77 * this representation, supplementary characters are represented as a pair |
77 * this representation, supplementary characters are represented as a pair |
78 * of <code>char</code> values, the first from the <em>high-surrogates</em> |
78 * of {@code char} values, the first from the <em>high-surrogates</em> |
79 * range, (\uD800-\uDBFF), the second from the |
79 * range, (\uD800-\uDBFF), the second from the |
80 * <em>low-surrogates</em> range (\uDC00-\uDFFF). |
80 * <em>low-surrogates</em> range (\uDC00-\uDFFF). |
81 * |
81 * |
82 * <p>A <code>char</code> value, therefore, represents Basic |
82 * <p>A {@code char} value, therefore, represents Basic |
83 * Multilingual Plane (BMP) code points, including the surrogate |
83 * Multilingual Plane (BMP) code points, including the surrogate |
84 * code points, or code units of the UTF-16 encoding. An |
84 * code points, or code units of the UTF-16 encoding. An |
85 * <code>int</code> value represents all Unicode code points, |
85 * {@code int} value represents all Unicode code points, |
86 * including supplementary code points. The lower (least significant) |
86 * including supplementary code points. The lower (least significant) |
87 * 21 bits of <code>int</code> are used to represent Unicode code |
87 * 21 bits of {@code int} are used to represent Unicode code |
88 * points and the upper (most significant) 11 bits must be zero. |
88 * points and the upper (most significant) 11 bits must be zero. |
89 * Unless otherwise specified, the behavior with respect to |
89 * Unless otherwise specified, the behavior with respect to |
90 * supplementary characters and surrogate <code>char</code> values is |
90 * supplementary characters and surrogate {@code char} values is |
91 * as follows: |
91 * as follows: |
92 * |
92 * |
93 * <ul> |
93 * <ul> |
94 * <li>The methods that only accept a <code>char</code> value cannot support |
94 * <li>The methods that only accept a {@code char} value cannot support |
95 * supplementary characters. They treat <code>char</code> values from the |
95 * supplementary characters. They treat {@code char} values from the |
96 * surrogate ranges as undefined characters. For example, |
96 * surrogate ranges as undefined characters. For example, |
97 * <code>Character.isLetter('\uD840')</code> returns <code>false</code>, even though |
97 * {@code Character.isLetter('\u005CuD840')} returns {@code false}, even though |
98 * this specific value if followed by any low-surrogate value in a string |
98 * this specific value if followed by any low-surrogate value in a string |
99 * would represent a letter. |
99 * would represent a letter. |
100 * |
100 * |
101 * <li>The methods that accept an <code>int</code> value support all |
101 * <li>The methods that accept an {@code int} value support all |
102 * Unicode characters, including supplementary characters. For |
102 * Unicode characters, including supplementary characters. For |
103 * example, <code>Character.isLetter(0x2F81A)</code> returns |
103 * example, {@code Character.isLetter(0x2F81A)} returns |
104 * <code>true</code> because the code point value represents a letter |
104 * {@code true} because the code point value represents a letter |
105 * (a CJK ideograph). |
105 * (a CJK ideograph). |
106 * </ul> |
106 * </ul> |
107 * |
107 * |
108 * <p>In the Java SE API documentation, <em>Unicode code point</em> is |
108 * <p>In the Java SE API documentation, <em>Unicode code point</em> is |
109 * used for character values in the range between U+0000 and U+10FFFF, |
109 * used for character values in the range between U+0000 and U+10FFFF, |
110 * and <em>Unicode code unit</em> is used for 16-bit |
110 * and <em>Unicode code unit</em> is used for 16-bit |
111 * <code>char</code> values that are code units of the <em>UTF-16</em> |
111 * {@code char} values that are code units of the <em>UTF-16</em> |
112 * encoding. For more information on Unicode terminology, refer to the |
112 * encoding. For more information on Unicode terminology, refer to the |
113 * <a href="http://www.unicode.org/glossary/">Unicode Glossary</a>. |
113 * <a href="http://www.unicode.org/glossary/">Unicode Glossary</a>. |
114 * |
114 * |
115 * @author Lee Boynton |
115 * @author Lee Boynton |
116 * @author Guy Steele |
116 * @author Guy Steele |
123 class Character implements java.io.Serializable, Comparable<Character> { |
123 class Character implements java.io.Serializable, Comparable<Character> { |
124 /** |
124 /** |
125 * The minimum radix available for conversion to and from strings. |
125 * The minimum radix available for conversion to and from strings. |
126 * The constant value of this field is the smallest value permitted |
126 * The constant value of this field is the smallest value permitted |
127 * for the radix argument in radix-conversion methods such as the |
127 * for the radix argument in radix-conversion methods such as the |
128 * <code>digit</code> method, the <code>forDigit</code> |
128 * {@code digit} method, the {@code forDigit} method, and the |
129 * method, and the <code>toString</code> method of class |
129 * {@code toString} method of class {@code Integer}. |
130 * <code>Integer</code>. |
|
131 * |
130 * |
132 * @see Character#digit(char, int) |
131 * @see Character#digit(char, int) |
133 * @see Character#forDigit(int, int) |
132 * @see Character#forDigit(int, int) |
134 * @see Integer#toString(int, int) |
133 * @see Integer#toString(int, int) |
135 * @see Integer#valueOf(String) |
134 * @see Integer#valueOf(String) |
138 |
137 |
139 /** |
138 /** |
140 * The maximum radix available for conversion to and from strings. |
139 * The maximum radix available for conversion to and from strings. |
141 * The constant value of this field is the largest value permitted |
140 * The constant value of this field is the largest value permitted |
142 * for the radix argument in radix-conversion methods such as the |
141 * for the radix argument in radix-conversion methods such as the |
143 * <code>digit</code> method, the <code>forDigit</code> |
142 * {@code digit} method, the {@code forDigit} method, and the |
144 * method, and the <code>toString</code> method of class |
143 * {@code toString} method of class {@code Integer}. |
145 * <code>Integer</code>. |
|
146 * |
144 * |
147 * @see Character#digit(char, int) |
145 * @see Character#digit(char, int) |
148 * @see Character#forDigit(int, int) |
146 * @see Character#forDigit(int, int) |
149 * @see Integer#toString(int, int) |
147 * @see Integer#toString(int, int) |
150 * @see Integer#valueOf(String) |
148 * @see Integer#valueOf(String) |
151 */ |
149 */ |
152 public static final int MAX_RADIX = 36; |
150 public static final int MAX_RADIX = 36; |
153 |
151 |
154 /** |
152 /** |
155 * The constant value of this field is the smallest value of type |
153 * The constant value of this field is the smallest value of type |
156 * <code>char</code>, <code>'\u0000'</code>. |
154 * {@code char}, {@code '\u005Cu0000'}. |
157 * |
155 * |
158 * @since 1.0.2 |
156 * @since 1.0.2 |
159 */ |
157 */ |
160 public static final char MIN_VALUE = '\u0000'; |
158 public static final char MIN_VALUE = '\u0000'; |
161 |
159 |
162 /** |
160 /** |
163 * The constant value of this field is the largest value of type |
161 * The constant value of this field is the largest value of type |
164 * <code>char</code>, <code>'\uFFFF'</code>. |
162 * {@code char}, {@code '\u005CuFFFF'}. |
165 * |
163 * |
166 * @since 1.0.2 |
164 * @since 1.0.2 |
167 */ |
165 */ |
168 public static final char MAX_VALUE = '\uFFFF'; |
166 public static final char MAX_VALUE = '\uFFFF'; |
169 |
167 |
170 /** |
168 /** |
171 * The <code>Class</code> instance representing the primitive type |
169 * The {@code Class} instance representing the primitive type |
172 * <code>char</code>. |
170 * {@code char}. |
173 * |
171 * |
174 * @since 1.1 |
172 * @since 1.1 |
175 */ |
173 */ |
176 @SuppressWarnings("unchecked") |
174 @SuppressWarnings("unchecked") |
177 public static final Class<Character> TYPE = Class.getPrimitiveClass("char"); |
175 public static final Class<Character> TYPE = Class.getPrimitiveClass("char"); |
493 |
491 |
494 /** |
492 /** |
495 * The minimum value of a |
493 * The minimum value of a |
496 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit"> |
494 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit"> |
497 * Unicode high-surrogate code unit</a> |
495 * Unicode high-surrogate code unit</a> |
498 * in the UTF-16 encoding, constant <code>'\uD800'</code>. |
496 * in the UTF-16 encoding, constant {@code '\u005CuD800'}. |
499 * A high-surrogate is also known as a <i>leading-surrogate</i>. |
497 * A high-surrogate is also known as a <i>leading-surrogate</i>. |
500 * |
498 * |
501 * @since 1.5 |
499 * @since 1.5 |
502 */ |
500 */ |
503 public static final char MIN_HIGH_SURROGATE = '\uD800'; |
501 public static final char MIN_HIGH_SURROGATE = '\uD800'; |
504 |
502 |
505 /** |
503 /** |
506 * The maximum value of a |
504 * The maximum value of a |
507 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit"> |
505 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit"> |
508 * Unicode high-surrogate code unit</a> |
506 * Unicode high-surrogate code unit</a> |
509 * in the UTF-16 encoding, constant <code>'\uDBFF'</code>. |
507 * in the UTF-16 encoding, constant {@code '\u005CuDBFF'}. |
510 * A high-surrogate is also known as a <i>leading-surrogate</i>. |
508 * A high-surrogate is also known as a <i>leading-surrogate</i>. |
511 * |
509 * |
512 * @since 1.5 |
510 * @since 1.5 |
513 */ |
511 */ |
514 public static final char MAX_HIGH_SURROGATE = '\uDBFF'; |
512 public static final char MAX_HIGH_SURROGATE = '\uDBFF'; |
515 |
513 |
516 /** |
514 /** |
517 * The minimum value of a |
515 * The minimum value of a |
518 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit"> |
516 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit"> |
519 * Unicode low-surrogate code unit</a> |
517 * Unicode low-surrogate code unit</a> |
520 * in the UTF-16 encoding, constant <code>'\uDC00'</code>. |
518 * in the UTF-16 encoding, constant {@code '\u005CuDC00'}. |
521 * A low-surrogate is also known as a <i>trailing-surrogate</i>. |
519 * A low-surrogate is also known as a <i>trailing-surrogate</i>. |
522 * |
520 * |
523 * @since 1.5 |
521 * @since 1.5 |
524 */ |
522 */ |
525 public static final char MIN_LOW_SURROGATE = '\uDC00'; |
523 public static final char MIN_LOW_SURROGATE = '\uDC00'; |
526 |
524 |
527 /** |
525 /** |
528 * The maximum value of a |
526 * The maximum value of a |
529 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit"> |
527 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit"> |
530 * Unicode low-surrogate code unit</a> |
528 * Unicode low-surrogate code unit</a> |
531 * in the UTF-16 encoding, constant <code>'\uDFFF'</code>. |
529 * in the UTF-16 encoding, constant {@code '\u005CuDFFF'}. |
532 * A low-surrogate is also known as a <i>trailing-surrogate</i>. |
530 * A low-surrogate is also known as a <i>trailing-surrogate</i>. |
533 * |
531 * |
534 * @since 1.5 |
532 * @since 1.5 |
535 */ |
533 */ |
536 public static final char MAX_LOW_SURROGATE = '\uDFFF'; |
534 public static final char MAX_LOW_SURROGATE = '\uDFFF'; |
537 |
535 |
538 /** |
536 /** |
539 * The minimum value of a Unicode surrogate code unit in the |
537 * The minimum value of a Unicode surrogate code unit in the |
540 * UTF-16 encoding, constant <code>'\uD800'</code>. |
538 * UTF-16 encoding, constant {@code '\u005CuD800'}. |
541 * |
539 * |
542 * @since 1.5 |
540 * @since 1.5 |
543 */ |
541 */ |
544 public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE; |
542 public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE; |
545 |
543 |
546 /** |
544 /** |
547 * The maximum value of a Unicode surrogate code unit in the |
545 * The maximum value of a Unicode surrogate code unit in the |
548 * UTF-16 encoding, constant <code>'\uDFFF'</code>. |
546 * UTF-16 encoding, constant {@code '\u005CuDFFF'}. |
549 * |
547 * |
550 * @since 1.5 |
548 * @since 1.5 |
551 */ |
549 */ |
552 public static final char MAX_SURROGATE = MAX_LOW_SURROGATE; |
550 public static final char MAX_SURROGATE = MAX_LOW_SURROGATE; |
553 |
551 |
580 |
578 |
581 |
579 |
582 /** |
580 /** |
583 * Instances of this class represent particular subsets of the Unicode |
581 * Instances of this class represent particular subsets of the Unicode |
584 * character set. The only family of subsets defined in the |
582 * character set. The only family of subsets defined in the |
585 * <code>Character</code> class is {@link Character.UnicodeBlock}. |
583 * {@code Character} class is {@link Character.UnicodeBlock}. |
586 * Other portions of the Java API may define other subsets for their |
584 * Other portions of the Java API may define other subsets for their |
587 * own purposes. |
585 * own purposes. |
588 * |
586 * |
589 * @since 1.2 |
587 * @since 1.2 |
590 */ |
588 */ |
591 public static class Subset { |
589 public static class Subset { |
592 |
590 |
593 private String name; |
591 private String name; |
594 |
592 |
595 /** |
593 /** |
596 * Constructs a new <code>Subset</code> instance. |
594 * Constructs a new {@code Subset} instance. |
597 * |
595 * |
598 * @param name The name of this subset |
596 * @param name The name of this subset |
599 * @exception NullPointerException if name is <code>null</code> |
597 * @exception NullPointerException if name is {@code null} |
600 */ |
598 */ |
601 protected Subset(String name) { |
599 protected Subset(String name) { |
602 if (name == null) { |
600 if (name == null) { |
603 throw new NullPointerException("name"); |
601 throw new NullPointerException("name"); |
604 } |
602 } |
605 this.name = name; |
603 this.name = name; |
606 } |
604 } |
607 |
605 |
608 /** |
606 /** |
609 * Compares two <code>Subset</code> objects for equality. |
607 * Compares two {@code Subset} objects for equality. |
610 * This method returns <code>true</code> if and only if |
608 * This method returns {@code true} if and only if |
611 * <code>this</code> and the argument refer to the same |
609 * {@code this} and the argument refer to the same |
612 * object; since this method is <code>final</code>, this |
610 * object; since this method is {@code final}, this |
613 * guarantee holds for all subclasses. |
611 * guarantee holds for all subclasses. |
614 */ |
612 */ |
615 public final boolean equals(Object obj) { |
613 public final boolean equals(Object obj) { |
616 return (this == obj); |
614 return (this == obj); |
617 } |
615 } |
618 |
616 |
619 /** |
617 /** |
620 * Returns the standard hash code as defined by the |
618 * Returns the standard hash code as defined by the |
621 * <code>{@link Object#hashCode}</code> method. This method |
619 * {@link Object#hashCode} method. This method |
622 * is <code>final</code> in order to ensure that the |
620 * is {@code final} in order to ensure that the |
623 * <code>equals</code> and <code>hashCode</code> methods will |
621 * {@code equals} and {@code hashCode} methods will |
624 * be consistent in all subclasses. |
622 * be consistent in all subclasses. |
625 */ |
623 */ |
626 public final int hashCode() { |
624 public final int hashCode() { |
627 return super.hashCode(); |
625 return super.hashCode(); |
628 } |
626 } |
2958 }; |
2956 }; |
2959 |
2957 |
2960 |
2958 |
2961 /** |
2959 /** |
2962 * Returns the object representing the Unicode block containing the |
2960 * Returns the object representing the Unicode block containing the |
2963 * given character, or <code>null</code> if the character is not a |
2961 * given character, or {@code null} if the character is not a |
2964 * member of a defined block. |
2962 * member of a defined block. |
2965 * |
2963 * |
2966 * <p><b>Note:</b> This method cannot handle |
2964 * <p><b>Note:</b> This method cannot handle |
2967 * <a href="Character.html#supplementary"> supplementary |
2965 * <a href="Character.html#supplementary"> supplementary |
2968 * characters</a>. To support all Unicode characters, including |
2966 * characters</a>. To support all Unicode characters, including |
2969 * supplementary characters, use the {@link #of(int)} method. |
2967 * supplementary characters, use the {@link #of(int)} method. |
2970 * |
2968 * |
2971 * @param c The character in question |
2969 * @param c The character in question |
2972 * @return The <code>UnicodeBlock</code> instance representing the |
2970 * @return The {@code UnicodeBlock} instance representing the |
2973 * Unicode block of which this character is a member, or |
2971 * Unicode block of which this character is a member, or |
2974 * <code>null</code> if the character is not a member of any |
2972 * {@code null} if the character is not a member of any |
2975 * Unicode block |
2973 * Unicode block |
2976 */ |
2974 */ |
2977 public static UnicodeBlock of(char c) { |
2975 public static UnicodeBlock of(char c) { |
2978 return of((int)c); |
2976 return of((int)c); |
2979 } |
2977 } |
2980 |
2978 |
2981 /** |
2979 /** |
2982 * Returns the object representing the Unicode block |
2980 * Returns the object representing the Unicode block |
2983 * containing the given character (Unicode code point), or |
2981 * containing the given character (Unicode code point), or |
2984 * <code>null</code> if the character is not a member of a |
2982 * {@code null} if the character is not a member of a |
2985 * defined block. |
2983 * defined block. |
2986 * |
2984 * |
2987 * @param codePoint the character (Unicode code point) in question. |
2985 * @param codePoint the character (Unicode code point) in question. |
2988 * @return The <code>UnicodeBlock</code> instance representing the |
2986 * @return The {@code UnicodeBlock} instance representing the |
2989 * Unicode block of which this character is a member, or |
2987 * Unicode block of which this character is a member, or |
2990 * <code>null</code> if the character is not a member of any |
2988 * {@code null} if the character is not a member of any |
2991 * Unicode block |
2989 * Unicode block |
2992 * @exception IllegalArgumentException if the specified |
2990 * @exception IllegalArgumentException if the specified |
2993 * <code>codePoint</code> is an invalid Unicode code point. |
2991 * {@code codePoint} is an invalid Unicode code point. |
2994 * @see Character#isValidCodePoint(int) |
2992 * @see Character#isValidCodePoint(int) |
2995 * @since 1.5 |
2993 * @since 1.5 |
2996 */ |
2994 */ |
2997 public static UnicodeBlock of(int codePoint) { |
2995 public static UnicodeBlock of(int codePoint) { |
2998 if (!isValidCodePoint(codePoint)) { |
2996 if (!isValidCodePoint(codePoint)) { |
3042 * string comparisons for block name validation. |
3040 * string comparisons for block name validation. |
3043 * <p> |
3041 * <p> |
3044 * If the Unicode Standard changes block names, both the previous and |
3042 * If the Unicode Standard changes block names, both the previous and |
3045 * current names will be accepted. |
3043 * current names will be accepted. |
3046 * |
3044 * |
3047 * @param blockName A <code>UnicodeBlock</code> name. |
3045 * @param blockName A {@code UnicodeBlock} name. |
3048 * @return The <code>UnicodeBlock</code> instance identified |
3046 * @return The {@code UnicodeBlock} instance identified |
3049 * by <code>blockName</code> |
3047 * by {@code blockName} |
3050 * @throws IllegalArgumentException if <code>blockName</code> is an |
3048 * @throws IllegalArgumentException if {@code blockName} is an |
3051 * invalid name |
3049 * invalid name |
3052 * @throws NullPointerException if <code>blockName</code> is null |
3050 * @throws NullPointerException if {@code blockName} is null |
3053 * @since 1.5 |
3051 * @since 1.5 |
3054 */ |
3052 */ |
3055 public static final UnicodeBlock forName(String blockName) { |
3053 public static final UnicodeBlock forName(String blockName) { |
3056 UnicodeBlock block = map.get(blockName.toUpperCase(Locale.US)); |
3054 UnicodeBlock block = map.get(blockName.toUpperCase(Locale.US)); |
3057 if (block == null) { |
3055 if (block == null) { |
4281 /** |
4279 /** |
4282 * Returns the enum constant representing the Unicode script of which |
4280 * Returns the enum constant representing the Unicode script of which |
4283 * the given character (Unicode code point) is assigned to. |
4281 * the given character (Unicode code point) is assigned to. |
4284 * |
4282 * |
4285 * @param codePoint the character (Unicode code point) in question. |
4283 * @param codePoint the character (Unicode code point) in question. |
4286 * @return The <code>UnicodeScript</code> constant representing the |
4284 * @return The {@code UnicodeScript} constant representing the |
4287 * Unicode script of which this character is assigned to. |
4285 * Unicode script of which this character is assigned to. |
4288 * |
4286 * |
4289 * @exception IllegalArgumentException if the specified |
4287 * @exception IllegalArgumentException if the specified |
4290 * <code>codePoint</code> is an invalid Unicode code point. |
4288 * {@code codePoint} is an invalid Unicode code point. |
4291 * @see Character#isValidCodePoint(int) |
4289 * @see Character#isValidCodePoint(int) |
4292 * |
4290 * |
4293 */ |
4291 */ |
4294 public static UnicodeScript of(int codePoint) { |
4292 public static UnicodeScript of(int codePoint) { |
4295 if (!isValidCodePoint(codePoint)) |
4293 if (!isValidCodePoint(codePoint)) |
4316 * Character case is ignored for all of the valid script names. |
4314 * Character case is ignored for all of the valid script names. |
4317 * The en_US locale's case mapping rules are used to provide |
4315 * The en_US locale's case mapping rules are used to provide |
4318 * case-insensitive string comparisons for script name validation. |
4316 * case-insensitive string comparisons for script name validation. |
4319 * <p> |
4317 * <p> |
4320 * |
4318 * |
4321 * @param scriptName A <code>UnicodeScript</code> name. |
4319 * @param scriptName A {@code UnicodeScript} name. |
4322 * @return The <code>UnicodeScript</code> constant identified |
4320 * @return The {@code UnicodeScript} constant identified |
4323 * by <code>scriptName</code> |
4321 * by {@code scriptName} |
4324 * @throws IllegalArgumentException if <code>scriptName</code> is an |
4322 * @throws IllegalArgumentException if {@code scriptName} is an |
4325 * invalid name |
4323 * invalid name |
4326 * @throws NullPointerException if <code>scriptName</code> is null |
4324 * @throws NullPointerException if {@code scriptName} is null |
4327 */ |
4325 */ |
4328 public static final UnicodeScript forName(String scriptName) { |
4326 public static final UnicodeScript forName(String scriptName) { |
4329 scriptName = scriptName.toUpperCase(Locale.ENGLISH); |
4327 scriptName = scriptName.toUpperCase(Locale.ENGLISH); |
4330 //.replace(' ', '_')); |
4328 //.replace(' ', '_')); |
4331 UnicodeScript sc = aliases.get(scriptName); |
4329 UnicodeScript sc = aliases.get(scriptName); |
4410 return (int)value; |
4408 return (int)value; |
4411 } |
4409 } |
4412 |
4410 |
4413 /** |
4411 /** |
4414 * Compares this object against the specified object. |
4412 * Compares this object against the specified object. |
4415 * The result is <code>true</code> if and only if the argument is not |
4413 * The result is {@code true} if and only if the argument is not |
4416 * <code>null</code> and is a <code>Character</code> object that |
4414 * {@code null} and is a {@code Character} object that |
4417 * represents the same <code>char</code> value as this object. |
4415 * represents the same {@code char} value as this object. |
4418 * |
4416 * |
4419 * @param obj the object to compare with. |
4417 * @param obj the object to compare with. |
4420 * @return <code>true</code> if the objects are the same; |
4418 * @return {@code true} if the objects are the same; |
4421 * <code>false</code> otherwise. |
4419 * {@code false} otherwise. |
4422 */ |
4420 */ |
4423 public boolean equals(Object obj) { |
4421 public boolean equals(Object obj) { |
4424 if (obj instanceof Character) { |
4422 if (obj instanceof Character) { |
4425 return value == ((Character)obj).charValue(); |
4423 return value == ((Character)obj).charValue(); |
4426 } |
4424 } |
4427 return false; |
4425 return false; |
4428 } |
4426 } |
4429 |
4427 |
4430 /** |
4428 /** |
4431 * Returns a <code>String</code> object representing this |
4429 * Returns a {@code String} object representing this |
4432 * <code>Character</code>'s value. The result is a string of |
4430 * {@code Character}'s value. The result is a string of |
4433 * length 1 whose sole component is the primitive |
4431 * length 1 whose sole component is the primitive |
4434 * <code>char</code> value represented by this |
4432 * {@code char} value represented by this |
4435 * <code>Character</code> object. |
4433 * {@code Character} object. |
4436 * |
4434 * |
4437 * @return a string representation of this object. |
4435 * @return a string representation of this object. |
4438 */ |
4436 */ |
4439 public String toString() { |
4437 public String toString() { |
4440 char buf[] = {value}; |
4438 char buf[] = {value}; |
4441 return String.valueOf(buf); |
4439 return String.valueOf(buf); |
4442 } |
4440 } |
4443 |
4441 |
4444 /** |
4442 /** |
4445 * Returns a <code>String</code> object representing the |
4443 * Returns a {@code String} object representing the |
4446 * specified <code>char</code>. The result is a string of length |
4444 * specified {@code char}. The result is a string of length |
4447 * 1 consisting solely of the specified <code>char</code>. |
4445 * 1 consisting solely of the specified {@code char}. |
4448 * |
4446 * |
4449 * @param c the <code>char</code> to be converted |
4447 * @param c the {@code char} to be converted |
4450 * @return the string representation of the specified <code>char</code> |
4448 * @return the string representation of the specified {@code char} |
4451 * @since 1.4 |
4449 * @since 1.4 |
4452 */ |
4450 */ |
4453 public static String toString(char c) { |
4451 public static String toString(char c) { |
4454 return String.valueOf(c); |
4452 return String.valueOf(c); |
4455 } |
4453 } |
4591 * isHighSurrogate(high) && isLowSurrogate(low) |
4589 * isHighSurrogate(high) && isLowSurrogate(low) |
4592 * </pre></blockquote> |
4590 * </pre></blockquote> |
4593 * |
4591 * |
4594 * @param high the high-surrogate code value to be tested |
4592 * @param high the high-surrogate code value to be tested |
4595 * @param low the low-surrogate code value to be tested |
4593 * @param low the low-surrogate code value to be tested |
4596 * @return <code>true</code> if the specified high and |
4594 * @return {@code true} if the specified high and |
4597 * low-surrogate code values represent a valid surrogate pair; |
4595 * low-surrogate code values represent a valid surrogate pair; |
4598 * <code>false</code> otherwise. |
4596 * {@code false} otherwise. |
4599 * @since 1.5 |
4597 * @since 1.5 |
4600 */ |
4598 */ |
4601 public static boolean isSurrogatePair(char high, char low) { |
4599 public static boolean isSurrogatePair(char high, char low) { |
4602 return isHighSurrogate(high) && isLowSurrogate(low); |
4600 return isHighSurrogate(high) && isLowSurrogate(low); |
4603 } |
4601 } |
4604 |
4602 |
4605 /** |
4603 /** |
4606 * Determines the number of <code>char</code> values needed to |
4604 * Determines the number of {@code char} values needed to |
4607 * represent the specified character (Unicode code point). If the |
4605 * represent the specified character (Unicode code point). If the |
4608 * specified character is equal to or greater than 0x10000, then |
4606 * specified character is equal to or greater than 0x10000, then |
4609 * the method returns 2. Otherwise, the method returns 1. |
4607 * the method returns 2. Otherwise, the method returns 1. |
4610 * |
4608 * |
4611 * <p>This method doesn't validate the specified character to be a |
4609 * <p>This method doesn't validate the specified character to be a |
4644 - MIN_LOW_SURROGATE); |
4642 - MIN_LOW_SURROGATE); |
4645 } |
4643 } |
4646 |
4644 |
4647 /** |
4645 /** |
4648 * Returns the code point at the given index of the |
4646 * Returns the code point at the given index of the |
4649 * <code>CharSequence</code>. If the <code>char</code> value at |
4647 * {@code CharSequence}. If the {@code char} value at |
4650 * the given index in the <code>CharSequence</code> is in the |
4648 * the given index in the {@code CharSequence} is in the |
4651 * high-surrogate range, the following index is less than the |
4649 * high-surrogate range, the following index is less than the |
4652 * length of the <code>CharSequence</code>, and the |
4650 * length of the {@code CharSequence}, and the |
4653 * <code>char</code> value at the following index is in the |
4651 * {@code char} value at the following index is in the |
4654 * low-surrogate range, then the supplementary code point |
4652 * low-surrogate range, then the supplementary code point |
4655 * corresponding to this surrogate pair is returned. Otherwise, |
4653 * corresponding to this surrogate pair is returned. Otherwise, |
4656 * the <code>char</code> value at the given index is returned. |
4654 * the {@code char} value at the given index is returned. |
4657 * |
4655 * |
4658 * @param seq a sequence of <code>char</code> values (Unicode code |
4656 * @param seq a sequence of {@code char} values (Unicode code |
4659 * units) |
4657 * units) |
4660 * @param index the index to the <code>char</code> values (Unicode |
4658 * @param index the index to the {@code char} values (Unicode |
4661 * code units) in <code>seq</code> to be converted |
4659 * code units) in {@code seq} to be converted |
4662 * @return the Unicode code point at the given index |
4660 * @return the Unicode code point at the given index |
4663 * @exception NullPointerException if <code>seq</code> is null. |
4661 * @exception NullPointerException if {@code seq} is null. |
4664 * @exception IndexOutOfBoundsException if the value |
4662 * @exception IndexOutOfBoundsException if the value |
4665 * <code>index</code> is negative or not less than |
4663 * {@code index} is negative or not less than |
4666 * {@link CharSequence#length() seq.length()}. |
4664 * {@link CharSequence#length() seq.length()}. |
4667 * @since 1.5 |
4665 * @since 1.5 |
4668 */ |
4666 */ |
4669 public static int codePointAt(CharSequence seq, int index) { |
4667 public static int codePointAt(CharSequence seq, int index) { |
4670 char c1 = seq.charAt(index++); |
4668 char c1 = seq.charAt(index++); |
4679 return c1; |
4677 return c1; |
4680 } |
4678 } |
4681 |
4679 |
4682 /** |
4680 /** |
4683 * Returns the code point at the given index of the |
4681 * Returns the code point at the given index of the |
4684 * <code>char</code> array. If the <code>char</code> value at |
4682 * {@code char} array. If the {@code char} value at |
4685 * the given index in the <code>char</code> array is in the |
4683 * the given index in the {@code char} array is in the |
4686 * high-surrogate range, the following index is less than the |
4684 * high-surrogate range, the following index is less than the |
4687 * length of the <code>char</code> array, and the |
4685 * length of the {@code char} array, and the |
4688 * <code>char</code> value at the following index is in the |
4686 * {@code char} value at the following index is in the |
4689 * low-surrogate range, then the supplementary code point |
4687 * low-surrogate range, then the supplementary code point |
4690 * corresponding to this surrogate pair is returned. Otherwise, |
4688 * corresponding to this surrogate pair is returned. Otherwise, |
4691 * the <code>char</code> value at the given index is returned. |
4689 * the {@code char} value at the given index is returned. |
4692 * |
4690 * |
4693 * @param a the <code>char</code> array |
4691 * @param a the {@code char} array |
4694 * @param index the index to the <code>char</code> values (Unicode |
4692 * @param index the index to the {@code char} values (Unicode |
4695 * code units) in the <code>char</code> array to be converted |
4693 * code units) in the {@code char} array to be converted |
4696 * @return the Unicode code point at the given index |
4694 * @return the Unicode code point at the given index |
4697 * @exception NullPointerException if <code>a</code> is null. |
4695 * @exception NullPointerException if {@code a} is null. |
4698 * @exception IndexOutOfBoundsException if the value |
4696 * @exception IndexOutOfBoundsException if the value |
4699 * <code>index</code> is negative or not less than |
4697 * {@code index} is negative or not less than |
4700 * the length of the <code>char</code> array. |
4698 * the length of the {@code char} array. |
4701 * @since 1.5 |
4699 * @since 1.5 |
4702 */ |
4700 */ |
4703 public static int codePointAt(char[] a, int index) { |
4701 public static int codePointAt(char[] a, int index) { |
4704 return codePointAtImpl(a, index, a.length); |
4702 return codePointAtImpl(a, index, a.length); |
4705 } |
4703 } |
4706 |
4704 |
4707 /** |
4705 /** |
4708 * Returns the code point at the given index of the |
4706 * Returns the code point at the given index of the |
4709 * <code>char</code> array, where only array elements with |
4707 * {@code char} array, where only array elements with |
4710 * <code>index</code> less than <code>limit</code> can be used. If |
4708 * {@code index} less than {@code limit} can be used. If |
4711 * the <code>char</code> value at the given index in the |
4709 * the {@code char} value at the given index in the |
4712 * <code>char</code> array is in the high-surrogate range, the |
4710 * {@code char} array is in the high-surrogate range, the |
4713 * following index is less than the <code>limit</code>, and the |
4711 * following index is less than the {@code limit}, and the |
4714 * <code>char</code> value at the following index is in the |
4712 * {@code char} value at the following index is in the |
4715 * low-surrogate range, then the supplementary code point |
4713 * low-surrogate range, then the supplementary code point |
4716 * corresponding to this surrogate pair is returned. Otherwise, |
4714 * corresponding to this surrogate pair is returned. Otherwise, |
4717 * the <code>char</code> value at the given index is returned. |
4715 * the {@code char} value at the given index is returned. |
4718 * |
4716 * |
4719 * @param a the <code>char</code> array |
4717 * @param a the {@code char} array |
4720 * @param index the index to the <code>char</code> values (Unicode |
4718 * @param index the index to the {@code char} values (Unicode |
4721 * code units) in the <code>char</code> array to be converted |
4719 * code units) in the {@code char} array to be converted |
4722 * @param limit the index after the last array element that can be used in the |
4720 * @param limit the index after the last array element that |
4723 * <code>char</code> array |
4721 * can be used in the {@code char} array |
4724 * @return the Unicode code point at the given index |
4722 * @return the Unicode code point at the given index |
4725 * @exception NullPointerException if <code>a</code> is null. |
4723 * @exception NullPointerException if {@code a} is null. |
4726 * @exception IndexOutOfBoundsException if the <code>index</code> |
4724 * @exception IndexOutOfBoundsException if the {@code index} |
4727 * argument is negative or not less than the <code>limit</code> |
4725 * argument is negative or not less than the {@code limit} |
4728 * argument, or if the <code>limit</code> argument is negative or |
4726 * argument, or if the {@code limit} argument is negative or |
4729 * greater than the length of the <code>char</code> array. |
4727 * greater than the length of the {@code char} array. |
4730 * @since 1.5 |
4728 * @since 1.5 |
4731 */ |
4729 */ |
4732 public static int codePointAt(char[] a, int index, int limit) { |
4730 public static int codePointAt(char[] a, int index, int limit) { |
4733 if (index >= limit || limit < 0 || limit > a.length) { |
4731 if (index >= limit || limit < 0 || limit > a.length) { |
4734 throw new IndexOutOfBoundsException(); |
4732 throw new IndexOutOfBoundsException(); |
4750 return c1; |
4748 return c1; |
4751 } |
4749 } |
4752 |
4750 |
4753 /** |
4751 /** |
4754 * Returns the code point preceding the given index of the |
4752 * Returns the code point preceding the given index of the |
4755 * <code>CharSequence</code>. If the <code>char</code> value at |
4753 * {@code CharSequence}. If the {@code char} value at |
4756 * <code>(index - 1)</code> in the <code>CharSequence</code> is in |
4754 * {@code (index - 1)} in the {@code CharSequence} is in |
4757 * the low-surrogate range, <code>(index - 2)</code> is not |
4755 * the low-surrogate range, {@code (index - 2)} is not |
4758 * negative, and the <code>char</code> value at <code>(index - |
4756 * negative, and the {@code char} value at {@code (index - 2)} |
4759 * 2)</code> in the <code>CharSequence</code> is in the |
4757 * in the {@code CharSequence} is in the |
4760 * high-surrogate range, then the supplementary code point |
4758 * high-surrogate range, then the supplementary code point |
4761 * corresponding to this surrogate pair is returned. Otherwise, |
4759 * corresponding to this surrogate pair is returned. Otherwise, |
4762 * the <code>char</code> value at <code>(index - 1)</code> is |
4760 * the {@code char} value at {@code (index - 1)} is |
4763 * returned. |
4761 * returned. |
4764 * |
4762 * |
4765 * @param seq the <code>CharSequence</code> instance |
4763 * @param seq the {@code CharSequence} instance |
4766 * @param index the index following the code point that should be returned |
4764 * @param index the index following the code point that should be returned |
4767 * @return the Unicode code point value before the given index. |
4765 * @return the Unicode code point value before the given index. |
4768 * @exception NullPointerException if <code>seq</code> is null. |
4766 * @exception NullPointerException if {@code seq} is null. |
4769 * @exception IndexOutOfBoundsException if the <code>index</code> |
4767 * @exception IndexOutOfBoundsException if the {@code index} |
4770 * argument is less than 1 or greater than {@link |
4768 * argument is less than 1 or greater than {@link |
4771 * CharSequence#length() seq.length()}. |
4769 * CharSequence#length() seq.length()}. |
4772 * @since 1.5 |
4770 * @since 1.5 |
4773 */ |
4771 */ |
4774 public static int codePointBefore(CharSequence seq, int index) { |
4772 public static int codePointBefore(CharSequence seq, int index) { |
4784 return c2; |
4782 return c2; |
4785 } |
4783 } |
4786 |
4784 |
4787 /** |
4785 /** |
4788 * Returns the code point preceding the given index of the |
4786 * Returns the code point preceding the given index of the |
4789 * <code>char</code> array. If the <code>char</code> value at |
4787 * {@code char} array. If the {@code char} value at |
4790 * <code>(index - 1)</code> in the <code>char</code> array is in |
4788 * {@code (index - 1)} in the {@code char} array is in |
4791 * the low-surrogate range, <code>(index - 2)</code> is not |
4789 * the low-surrogate range, {@code (index - 2)} is not |
4792 * negative, and the <code>char</code> value at <code>(index - |
4790 * negative, and the {@code char} value at {@code (index - 2)} |
4793 * 2)</code> in the <code>char</code> array is in the |
4791 * in the {@code char} array is in the |
4794 * high-surrogate range, then the supplementary code point |
4792 * high-surrogate range, then the supplementary code point |
4795 * corresponding to this surrogate pair is returned. Otherwise, |
4793 * corresponding to this surrogate pair is returned. Otherwise, |
4796 * the <code>char</code> value at <code>(index - 1)</code> is |
4794 * the {@code char} value at {@code (index - 1)} is |
4797 * returned. |
4795 * returned. |
4798 * |
4796 * |
4799 * @param a the <code>char</code> array |
4797 * @param a the {@code char} array |
4800 * @param index the index following the code point that should be returned |
4798 * @param index the index following the code point that should be returned |
4801 * @return the Unicode code point value before the given index. |
4799 * @return the Unicode code point value before the given index. |
4802 * @exception NullPointerException if <code>a</code> is null. |
4800 * @exception NullPointerException if {@code a} is null. |
4803 * @exception IndexOutOfBoundsException if the <code>index</code> |
4801 * @exception IndexOutOfBoundsException if the {@code index} |
4804 * argument is less than 1 or greater than the length of the |
4802 * argument is less than 1 or greater than the length of the |
4805 * <code>char</code> array |
4803 * {@code char} array |
4806 * @since 1.5 |
4804 * @since 1.5 |
4807 */ |
4805 */ |
4808 public static int codePointBefore(char[] a, int index) { |
4806 public static int codePointBefore(char[] a, int index) { |
4809 return codePointBeforeImpl(a, index, 0); |
4807 return codePointBeforeImpl(a, index, 0); |
4810 } |
4808 } |
4811 |
4809 |
4812 /** |
4810 /** |
4813 * Returns the code point preceding the given index of the |
4811 * Returns the code point preceding the given index of the |
4814 * <code>char</code> array, where only array elements with |
4812 * {@code char} array, where only array elements with |
4815 * <code>index</code> greater than or equal to <code>start</code> |
4813 * {@code index} greater than or equal to {@code start} |
4816 * can be used. If the <code>char</code> value at <code>(index - |
4814 * can be used. If the {@code char} value at {@code (index - 1)} |
4817 * 1)</code> in the <code>char</code> array is in the |
4815 * in the {@code char} array is in the |
4818 * low-surrogate range, <code>(index - 2)</code> is not less than |
4816 * low-surrogate range, {@code (index - 2)} is not less than |
4819 * <code>start</code>, and the <code>char</code> value at |
4817 * {@code start}, and the {@code char} value at |
4820 * <code>(index - 2)</code> in the <code>char</code> array is in |
4818 * {@code (index - 2)} in the {@code char} array is in |
4821 * the high-surrogate range, then the supplementary code point |
4819 * the high-surrogate range, then the supplementary code point |
4822 * corresponding to this surrogate pair is returned. Otherwise, |
4820 * corresponding to this surrogate pair is returned. Otherwise, |
4823 * the <code>char</code> value at <code>(index - 1)</code> is |
4821 * the {@code char} value at {@code (index - 1)} is |
4824 * returned. |
4822 * returned. |
4825 * |
4823 * |
4826 * @param a the <code>char</code> array |
4824 * @param a the {@code char} array |
4827 * @param index the index following the code point that should be returned |
4825 * @param index the index following the code point that should be returned |
4828 * @param start the index of the first array element in the |
4826 * @param start the index of the first array element in the |
4829 * <code>char</code> array |
4827 * {@code char} array |
4830 * @return the Unicode code point value before the given index. |
4828 * @return the Unicode code point value before the given index. |
4831 * @exception NullPointerException if <code>a</code> is null. |
4829 * @exception NullPointerException if {@code a} is null. |
4832 * @exception IndexOutOfBoundsException if the <code>index</code> |
4830 * @exception IndexOutOfBoundsException if the {@code index} |
4833 * argument is not greater than the <code>start</code> argument or |
4831 * argument is not greater than the {@code start} argument or |
4834 * is greater than the length of the <code>char</code> array, or |
4832 * is greater than the length of the {@code char} array, or |
4835 * if the <code>start</code> argument is negative or not less than |
4833 * if the {@code start} argument is negative or not less than |
4836 * the length of the <code>char</code> array. |
4834 * the length of the {@code char} array. |
4837 * @since 1.5 |
4835 * @since 1.5 |
4838 */ |
4836 */ |
4839 public static int codePointBefore(char[] a, int index, int start) { |
4837 public static int codePointBefore(char[] a, int index, int start) { |
4840 if (index <= start || start < 0 || start >= a.length) { |
4838 if (index <= start || start < 0 || start >= a.length) { |
4841 throw new IndexOutOfBoundsException(); |
4839 throw new IndexOutOfBoundsException(); |
4916 |
4914 |
4917 /** |
4915 /** |
4918 * Converts the specified character (Unicode code point) to its |
4916 * Converts the specified character (Unicode code point) to its |
4919 * UTF-16 representation. If the specified code point is a BMP |
4917 * UTF-16 representation. If the specified code point is a BMP |
4920 * (Basic Multilingual Plane or Plane 0) value, the same value is |
4918 * (Basic Multilingual Plane or Plane 0) value, the same value is |
4921 * stored in <code>dst[dstIndex]</code>, and 1 is returned. If the |
4919 * stored in {@code dst[dstIndex]}, and 1 is returned. If the |
4922 * specified code point is a supplementary character, its |
4920 * specified code point is a supplementary character, its |
4923 * surrogate values are stored in <code>dst[dstIndex]</code> |
4921 * surrogate values are stored in {@code dst[dstIndex]} |
4924 * (high-surrogate) and <code>dst[dstIndex+1]</code> |
4922 * (high-surrogate) and {@code dst[dstIndex+1]} |
4925 * (low-surrogate), and 2 is returned. |
4923 * (low-surrogate), and 2 is returned. |
4926 * |
4924 * |
4927 * @param codePoint the character (Unicode code point) to be converted. |
4925 * @param codePoint the character (Unicode code point) to be converted. |
4928 * @param dst an array of <code>char</code> in which the |
4926 * @param dst an array of {@code char} in which the |
4929 * <code>codePoint</code>'s UTF-16 value is stored. |
4927 * {@code codePoint}'s UTF-16 value is stored. |
4930 * @param dstIndex the start index into the <code>dst</code> |
4928 * @param dstIndex the start index into the {@code dst} |
4931 * array where the converted value is stored. |
4929 * array where the converted value is stored. |
4932 * @return 1 if the code point is a BMP code point, 2 if the |
4930 * @return 1 if the code point is a BMP code point, 2 if the |
4933 * code point is a supplementary code point. |
4931 * code point is a supplementary code point. |
4934 * @exception IllegalArgumentException if the specified |
4932 * @exception IllegalArgumentException if the specified |
4935 * <code>codePoint</code> is not a valid Unicode code point. |
4933 * {@code codePoint} is not a valid Unicode code point. |
4936 * @exception NullPointerException if the specified <code>dst</code> is null. |
4934 * @exception NullPointerException if the specified {@code dst} is null. |
4937 * @exception IndexOutOfBoundsException if <code>dstIndex</code> |
4935 * @exception IndexOutOfBoundsException if {@code dstIndex} |
4938 * is negative or not less than <code>dst.length</code>, or if |
4936 * is negative or not less than {@code dst.length}, or if |
4939 * <code>dst</code> at <code>dstIndex</code> doesn't have enough |
4937 * {@code dst} at {@code dstIndex} doesn't have enough |
4940 * array element(s) to store the resulting <code>char</code> |
4938 * array element(s) to store the resulting {@code char} |
4941 * value(s). (If <code>dstIndex</code> is equal to |
4939 * value(s). (If {@code dstIndex} is equal to |
4942 * <code>dst.length-1</code> and the specified |
4940 * {@code dst.length-1} and the specified |
4943 * <code>codePoint</code> is a supplementary character, the |
4941 * {@code codePoint} is a supplementary character, the |
4944 * high-surrogate value is not stored in |
4942 * high-surrogate value is not stored in |
4945 * <code>dst[dstIndex]</code>.) |
4943 * {@code dst[dstIndex]}.) |
4946 * @since 1.5 |
4944 * @since 1.5 |
4947 */ |
4945 */ |
4948 public static int toChars(int codePoint, char[] dst, int dstIndex) { |
4946 public static int toChars(int codePoint, char[] dst, int dstIndex) { |
4949 if (isBmpCodePoint(codePoint)) { |
4947 if (isBmpCodePoint(codePoint)) { |
4950 dst[dstIndex] = (char) codePoint; |
4948 dst[dstIndex] = (char) codePoint; |
4957 } |
4955 } |
4958 } |
4956 } |
4959 |
4957 |
4960 /** |
4958 /** |
4961 * Converts the specified character (Unicode code point) to its |
4959 * Converts the specified character (Unicode code point) to its |
4962 * UTF-16 representation stored in a <code>char</code> array. If |
4960 * UTF-16 representation stored in a {@code char} array. If |
4963 * the specified code point is a BMP (Basic Multilingual Plane or |
4961 * the specified code point is a BMP (Basic Multilingual Plane or |
4964 * Plane 0) value, the resulting <code>char</code> array has |
4962 * Plane 0) value, the resulting {@code char} array has |
4965 * the same value as <code>codePoint</code>. If the specified code |
4963 * the same value as {@code codePoint}. If the specified code |
4966 * point is a supplementary code point, the resulting |
4964 * point is a supplementary code point, the resulting |
4967 * <code>char</code> array has the corresponding surrogate pair. |
4965 * {@code char} array has the corresponding surrogate pair. |
4968 * |
4966 * |
4969 * @param codePoint a Unicode code point |
4967 * @param codePoint a Unicode code point |
4970 * @return a <code>char</code> array having |
4968 * @return a {@code char} array having |
4971 * <code>codePoint</code>'s UTF-16 representation. |
4969 * {@code codePoint}'s UTF-16 representation. |
4972 * @exception IllegalArgumentException if the specified |
4970 * @exception IllegalArgumentException if the specified |
4973 * <code>codePoint</code> is not a valid Unicode code point. |
4971 * {@code codePoint} is not a valid Unicode code point. |
4974 * @since 1.5 |
4972 * @since 1.5 |
4975 */ |
4973 */ |
4976 public static char[] toChars(int codePoint) { |
4974 public static char[] toChars(int codePoint) { |
4977 if (isBmpCodePoint(codePoint)) { |
4975 if (isBmpCodePoint(codePoint)) { |
4978 return new char[] { (char) codePoint }; |
4976 return new char[] { (char) codePoint }; |
4992 } |
4990 } |
4993 |
4991 |
4994 /** |
4992 /** |
4995 * Returns the number of Unicode code points in the text range of |
4993 * Returns the number of Unicode code points in the text range of |
4996 * the specified char sequence. The text range begins at the |
4994 * the specified char sequence. The text range begins at the |
4997 * specified <code>beginIndex</code> and extends to the |
4995 * specified {@code beginIndex} and extends to the |
4998 * <code>char</code> at index <code>endIndex - 1</code>. Thus the |
4996 * {@code char} at index {@code endIndex - 1}. Thus the |
4999 * length (in <code>char</code>s) of the text range is |
4997 * length (in {@code char}s) of the text range is |
5000 * <code>endIndex-beginIndex</code>. Unpaired surrogates within |
4998 * {@code endIndex-beginIndex}. Unpaired surrogates within |
5001 * the text range count as one code point each. |
4999 * the text range count as one code point each. |
5002 * |
5000 * |
5003 * @param seq the char sequence |
5001 * @param seq the char sequence |
5004 * @param beginIndex the index to the first <code>char</code> of |
5002 * @param beginIndex the index to the first {@code char} of |
5005 * the text range. |
5003 * the text range. |
5006 * @param endIndex the index after the last <code>char</code> of |
5004 * @param endIndex the index after the last {@code char} of |
5007 * the text range. |
5005 * the text range. |
5008 * @return the number of Unicode code points in the specified text |
5006 * @return the number of Unicode code points in the specified text |
5009 * range |
5007 * range |
5010 * @exception NullPointerException if <code>seq</code> is null. |
5008 * @exception NullPointerException if {@code seq} is null. |
5011 * @exception IndexOutOfBoundsException if the |
5009 * @exception IndexOutOfBoundsException if the |
5012 * <code>beginIndex</code> is negative, or <code>endIndex</code> |
5010 * {@code beginIndex} is negative, or {@code endIndex} |
5013 * is larger than the length of the given sequence, or |
5011 * is larger than the length of the given sequence, or |
5014 * <code>beginIndex</code> is larger than <code>endIndex</code>. |
5012 * {@code beginIndex} is larger than {@code endIndex}. |
5015 * @since 1.5 |
5013 * @since 1.5 |
5016 */ |
5014 */ |
5017 public static int codePointCount(CharSequence seq, int beginIndex, int endIndex) { |
5015 public static int codePointCount(CharSequence seq, int beginIndex, int endIndex) { |
5018 int length = seq.length(); |
5016 int length = seq.length(); |
5019 if (beginIndex < 0 || endIndex > length || beginIndex > endIndex) { |
5017 if (beginIndex < 0 || endIndex > length || beginIndex > endIndex) { |
5030 return n; |
5028 return n; |
5031 } |
5029 } |
5032 |
5030 |
5033 /** |
5031 /** |
5034 * Returns the number of Unicode code points in a subarray of the |
5032 * Returns the number of Unicode code points in a subarray of the |
5035 * <code>char</code> array argument. The <code>offset</code> |
5033 * {@code char} array argument. The {@code offset} |
5036 * argument is the index of the first <code>char</code> of the |
5034 * argument is the index of the first {@code char} of the |
5037 * subarray and the <code>count</code> argument specifies the |
5035 * subarray and the {@code count} argument specifies the |
5038 * length of the subarray in <code>char</code>s. Unpaired |
5036 * length of the subarray in {@code char}s. Unpaired |
5039 * surrogates within the subarray count as one code point each. |
5037 * surrogates within the subarray count as one code point each. |
5040 * |
5038 * |
5041 * @param a the <code>char</code> array |
5039 * @param a the {@code char} array |
5042 * @param offset the index of the first <code>char</code> in the |
5040 * @param offset the index of the first {@code char} in the |
5043 * given <code>char</code> array |
5041 * given {@code char} array |
5044 * @param count the length of the subarray in <code>char</code>s |
5042 * @param count the length of the subarray in {@code char}s |
5045 * @return the number of Unicode code points in the specified subarray |
5043 * @return the number of Unicode code points in the specified subarray |
5046 * @exception NullPointerException if <code>a</code> is null. |
5044 * @exception NullPointerException if {@code a} is null. |
5047 * @exception IndexOutOfBoundsException if <code>offset</code> or |
5045 * @exception IndexOutOfBoundsException if {@code offset} or |
5048 * <code>count</code> is negative, or if <code>offset + |
5046 * {@code count} is negative, or if {@code offset + |
5049 * count</code> is larger than the length of the given array. |
5047 * count} is larger than the length of the given array. |
5050 * @since 1.5 |
5048 * @since 1.5 |
5051 */ |
5049 */ |
5052 public static int codePointCount(char[] a, int offset, int count) { |
5050 public static int codePointCount(char[] a, int offset, int count) { |
5053 if (count > a.length - offset || offset < 0 || count < 0) { |
5051 if (count > a.length - offset || offset < 0 || count < 0) { |
5054 throw new IndexOutOfBoundsException(); |
5052 throw new IndexOutOfBoundsException(); |
5069 return n; |
5067 return n; |
5070 } |
5068 } |
5071 |
5069 |
5072 /** |
5070 /** |
5073 * Returns the index within the given char sequence that is offset |
5071 * Returns the index within the given char sequence that is offset |
5074 * from the given <code>index</code> by <code>codePointOffset</code> |
5072 * from the given {@code index} by {@code codePointOffset} |
5075 * code points. Unpaired surrogates within the text range given by |
5073 * code points. Unpaired surrogates within the text range given by |
5076 * <code>index</code> and <code>codePointOffset</code> count as |
5074 * {@code index} and {@code codePointOffset} count as |
5077 * one code point each. |
5075 * one code point each. |
5078 * |
5076 * |
5079 * @param seq the char sequence |
5077 * @param seq the char sequence |
5080 * @param index the index to be offset |
5078 * @param index the index to be offset |
5081 * @param codePointOffset the offset in code points |
5079 * @param codePointOffset the offset in code points |
5082 * @return the index within the char sequence |
5080 * @return the index within the char sequence |
5083 * @exception NullPointerException if <code>seq</code> is null. |
5081 * @exception NullPointerException if {@code seq} is null. |
5084 * @exception IndexOutOfBoundsException if <code>index</code> |
5082 * @exception IndexOutOfBoundsException if {@code index} |
5085 * is negative or larger then the length of the char sequence, |
5083 * is negative or larger then the length of the char sequence, |
5086 * or if <code>codePointOffset</code> is positive and the |
5084 * or if {@code codePointOffset} is positive and the |
5087 * subsequence starting with <code>index</code> has fewer than |
5085 * subsequence starting with {@code index} has fewer than |
5088 * <code>codePointOffset</code> code points, or if |
5086 * {@code codePointOffset} code points, or if |
5089 * <code>codePointOffset</code> is negative and the subsequence |
5087 * {@code codePointOffset} is negative and the subsequence |
5090 * before <code>index</code> has fewer than the absolute value |
5088 * before {@code index} has fewer than the absolute value |
5091 * of <code>codePointOffset</code> code points. |
5089 * of {@code codePointOffset} code points. |
5092 * @since 1.5 |
5090 * @since 1.5 |
5093 */ |
5091 */ |
5094 public static int offsetByCodePoints(CharSequence seq, int index, |
5092 public static int offsetByCodePoints(CharSequence seq, int index, |
5095 int codePointOffset) { |
5093 int codePointOffset) { |
5096 int length = seq.length(); |
5094 int length = seq.length(); |
5124 } |
5122 } |
5125 return x; |
5123 return x; |
5126 } |
5124 } |
5127 |
5125 |
5128 /** |
5126 /** |
5129 * Returns the index within the given <code>char</code> subarray |
5127 * Returns the index within the given {@code char} subarray |
5130 * that is offset from the given <code>index</code> by |
5128 * that is offset from the given {@code index} by |
5131 * <code>codePointOffset</code> code points. The |
5129 * {@code codePointOffset} code points. The |
5132 * <code>start</code> and <code>count</code> arguments specify a |
5130 * {@code start} and {@code count} arguments specify a |
5133 * subarray of the <code>char</code> array. Unpaired surrogates |
5131 * subarray of the {@code char} array. Unpaired surrogates |
5134 * within the text range given by <code>index</code> and |
5132 * within the text range given by {@code index} and |
5135 * <code>codePointOffset</code> count as one code point each. |
5133 * {@code codePointOffset} count as one code point each. |
5136 * |
5134 * |
5137 * @param a the <code>char</code> array |
5135 * @param a the {@code char} array |
5138 * @param start the index of the first <code>char</code> of the |
5136 * @param start the index of the first {@code char} of the |
5139 * subarray |
5137 * subarray |
5140 * @param count the length of the subarray in <code>char</code>s |
5138 * @param count the length of the subarray in {@code char}s |
5141 * @param index the index to be offset |
5139 * @param index the index to be offset |
5142 * @param codePointOffset the offset in code points |
5140 * @param codePointOffset the offset in code points |
5143 * @return the index within the subarray |
5141 * @return the index within the subarray |
5144 * @exception NullPointerException if <code>a</code> is null. |
5142 * @exception NullPointerException if {@code a} is null. |
5145 * @exception IndexOutOfBoundsException |
5143 * @exception IndexOutOfBoundsException |
5146 * if <code>start</code> or <code>count</code> is negative, |
5144 * if {@code start} or {@code count} is negative, |
5147 * or if <code>start + count</code> is larger than the length of |
5145 * or if {@code start + count} is larger than the length of |
5148 * the given array, |
5146 * the given array, |
5149 * or if <code>index</code> is less than <code>start</code> or |
5147 * or if {@code index} is less than {@code start} or |
5150 * larger then <code>start + count</code>, |
5148 * larger then {@code start + count}, |
5151 * or if <code>codePointOffset</code> is positive and the text range |
5149 * or if {@code codePointOffset} is positive and the text range |
5152 * starting with <code>index</code> and ending with <code>start |
5150 * starting with {@code index} and ending with {@code start + count - 1} |
5153 * + count - 1</code> has fewer than <code>codePointOffset</code> code |
5151 * has fewer than {@code codePointOffset} code |
5154 * points, |
5152 * points, |
5155 * or if <code>codePointOffset</code> is negative and the text range |
5153 * or if {@code codePointOffset} is negative and the text range |
5156 * starting with <code>start</code> and ending with <code>index |
5154 * starting with {@code start} and ending with {@code index - 1} |
5157 * - 1</code> has fewer than the absolute value of |
5155 * has fewer than the absolute value of |
5158 * <code>codePointOffset</code> code points. |
5156 * {@code codePointOffset} code points. |
5159 * @since 1.5 |
5157 * @since 1.5 |
5160 */ |
5158 */ |
5161 public static int offsetByCodePoints(char[] a, int start, int count, |
5159 public static int offsetByCodePoints(char[] a, int start, int count, |
5162 int index, int codePointOffset) { |
5160 int index, int codePointOffset) { |
5163 if (count > a.length-start || start < 0 || count < 0 |
5161 if (count > a.length-start || start < 0 || count < 0 |
5218 * href="#supplementary"> supplementary characters</a>. To support |
5216 * href="#supplementary"> supplementary characters</a>. To support |
5219 * all Unicode characters, including supplementary characters, use |
5217 * all Unicode characters, including supplementary characters, use |
5220 * the {@link #isLowerCase(int)} method. |
5218 * the {@link #isLowerCase(int)} method. |
5221 * |
5219 * |
5222 * @param ch the character to be tested. |
5220 * @param ch the character to be tested. |
5223 * @return <code>true</code> if the character is lowercase; |
5221 * @return {@code true} if the character is lowercase; |
5224 * <code>false</code> otherwise. |
5222 * {@code false} otherwise. |
5225 * @see Character#isLowerCase(char) |
5223 * @see Character#isLowerCase(char) |
5226 * @see Character#isTitleCase(char) |
5224 * @see Character#isTitleCase(char) |
5227 * @see Character#toLowerCase(char) |
5225 * @see Character#toLowerCase(char) |
5228 * @see Character#getType(char) |
5226 * @see Character#getType(char) |
5229 */ |
5227 */ |
5282 * href="#supplementary"> supplementary characters</a>. To support |
5280 * href="#supplementary"> supplementary characters</a>. To support |
5283 * all Unicode characters, including supplementary characters, use |
5281 * all Unicode characters, including supplementary characters, use |
5284 * the {@link #isUpperCase(int)} method. |
5282 * the {@link #isUpperCase(int)} method. |
5285 * |
5283 * |
5286 * @param ch the character to be tested. |
5284 * @param ch the character to be tested. |
5287 * @return <code>true</code> if the character is uppercase; |
5285 * @return {@code true} if the character is uppercase; |
5288 * <code>false</code> otherwise. |
5286 * {@code false} otherwise. |
5289 * @see Character#isLowerCase(char) |
5287 * @see Character#isLowerCase(char) |
5290 * @see Character#isTitleCase(char) |
5288 * @see Character#isTitleCase(char) |
5291 * @see Character#toUpperCase(char) |
5289 * @see Character#toUpperCase(char) |
5292 * @see Character#getType(char) |
5290 * @see Character#getType(char) |
5293 * @since 1.0 |
5291 * @since 1.0 |
5327 |
5325 |
5328 /** |
5326 /** |
5329 * Determines if the specified character is a titlecase character. |
5327 * Determines if the specified character is a titlecase character. |
5330 * <p> |
5328 * <p> |
5331 * A character is a titlecase character if its general |
5329 * A character is a titlecase character if its general |
5332 * category type, provided by <code>Character.getType(ch)</code>, |
5330 * category type, provided by {@code Character.getType(ch)}, |
5333 * is <code>TITLECASE_LETTER</code>. |
5331 * is {@code TITLECASE_LETTER}. |
5334 * <p> |
5332 * <p> |
5335 * Some characters look like pairs of Latin letters. For example, there |
5333 * Some characters look like pairs of Latin letters. For example, there |
5336 * is an uppercase letter that looks like "LJ" and has a corresponding |
5334 * is an uppercase letter that looks like "LJ" and has a corresponding |
5337 * lowercase letter that looks like "lj". A third form, which looks like "Lj", |
5335 * lowercase letter that looks like "lj". A third form, which looks like "Lj", |
5338 * is the appropriate form to use when rendering a word in lowercase |
5336 * is the appropriate form to use when rendering a word in lowercase |
5339 * with initial capitals, as for a book title. |
5337 * with initial capitals, as for a book title. |
5340 * <p> |
5338 * <p> |
5341 * These are some of the Unicode characters for which this method returns |
5339 * These are some of the Unicode characters for which this method returns |
5342 * <code>true</code>: |
5340 * {@code true}: |
5343 * <ul> |
5341 * <ul> |
5344 * <li><code>LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON</code> |
5342 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON} |
5345 * <li><code>LATIN CAPITAL LETTER L WITH SMALL LETTER J</code> |
5343 * <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J} |
5346 * <li><code>LATIN CAPITAL LETTER N WITH SMALL LETTER J</code> |
5344 * <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J} |
5347 * <li><code>LATIN CAPITAL LETTER D WITH SMALL LETTER Z</code> |
5345 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z} |
5348 * </ul> |
5346 * </ul> |
5349 * <p> Many other Unicode characters are titlecase too.<p> |
5347 * <p> Many other Unicode characters are titlecase too.<p> |
5350 * |
5348 * |
5351 * <p><b>Note:</b> This method cannot handle <a |
5349 * <p><b>Note:</b> This method cannot handle <a |
5352 * href="#supplementary"> supplementary characters</a>. To support |
5350 * href="#supplementary"> supplementary characters</a>. To support |
5353 * all Unicode characters, including supplementary characters, use |
5351 * all Unicode characters, including supplementary characters, use |
5354 * the {@link #isTitleCase(int)} method. |
5352 * the {@link #isTitleCase(int)} method. |
5355 * |
5353 * |
5356 * @param ch the character to be tested. |
5354 * @param ch the character to be tested. |
5357 * @return <code>true</code> if the character is titlecase; |
5355 * @return {@code true} if the character is titlecase; |
5358 * <code>false</code> otherwise. |
5356 * {@code false} otherwise. |
5359 * @see Character#isLowerCase(char) |
5357 * @see Character#isLowerCase(char) |
5360 * @see Character#isUpperCase(char) |
5358 * @see Character#isUpperCase(char) |
5361 * @see Character#toTitleCase(char) |
5359 * @see Character#toTitleCase(char) |
5362 * @see Character#getType(char) |
5360 * @see Character#getType(char) |
5363 * @since 1.0.2 |
5361 * @since 1.0.2 |
5369 /** |
5367 /** |
5370 * Determines if the specified character (Unicode code point) is a titlecase character. |
5368 * Determines if the specified character (Unicode code point) is a titlecase character. |
5371 * <p> |
5369 * <p> |
5372 * A character is a titlecase character if its general |
5370 * A character is a titlecase character if its general |
5373 * category type, provided by {@link Character#getType(int) getType(codePoint)}, |
5371 * category type, provided by {@link Character#getType(int) getType(codePoint)}, |
5374 * is <code>TITLECASE_LETTER</code>. |
5372 * is {@code TITLECASE_LETTER}. |
5375 * <p> |
5373 * <p> |
5376 * Some characters look like pairs of Latin letters. For example, there |
5374 * Some characters look like pairs of Latin letters. For example, there |
5377 * is an uppercase letter that looks like "LJ" and has a corresponding |
5375 * is an uppercase letter that looks like "LJ" and has a corresponding |
5378 * lowercase letter that looks like "lj". A third form, which looks like "Lj", |
5376 * lowercase letter that looks like "lj". A third form, which looks like "Lj", |
5379 * is the appropriate form to use when rendering a word in lowercase |
5377 * is the appropriate form to use when rendering a word in lowercase |
5380 * with initial capitals, as for a book title. |
5378 * with initial capitals, as for a book title. |
5381 * <p> |
5379 * <p> |
5382 * These are some of the Unicode characters for which this method returns |
5380 * These are some of the Unicode characters for which this method returns |
5383 * <code>true</code>: |
5381 * {@code true}: |
5384 * <ul> |
5382 * <ul> |
5385 * <li><code>LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON</code> |
5383 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON} |
5386 * <li><code>LATIN CAPITAL LETTER L WITH SMALL LETTER J</code> |
5384 * <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J} |
5387 * <li><code>LATIN CAPITAL LETTER N WITH SMALL LETTER J</code> |
5385 * <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J} |
5388 * <li><code>LATIN CAPITAL LETTER D WITH SMALL LETTER Z</code> |
5386 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z} |
5389 * </ul> |
5387 * </ul> |
5390 * <p> Many other Unicode characters are titlecase too.<p> |
5388 * <p> Many other Unicode characters are titlecase too.<p> |
5391 * |
5389 * |
5392 * @param codePoint the character (Unicode code point) to be tested. |
5390 * @param codePoint the character (Unicode code point) to be tested. |
5393 * @return <code>true</code> if the character is titlecase; |
5391 * @return {@code true} if the character is titlecase; |
5394 * <code>false</code> otherwise. |
5392 * {@code false} otherwise. |
5395 * @see Character#isLowerCase(int) |
5393 * @see Character#isLowerCase(int) |
5396 * @see Character#isUpperCase(int) |
5394 * @see Character#isUpperCase(int) |
5397 * @see Character#toTitleCase(int) |
5395 * @see Character#toTitleCase(int) |
5398 * @see Character#getType(int) |
5396 * @see Character#getType(int) |
5399 * @since 1.5 |
5397 * @since 1.5 |
5404 |
5402 |
5405 /** |
5403 /** |
5406 * Determines if the specified character is a digit. |
5404 * Determines if the specified character is a digit. |
5407 * <p> |
5405 * <p> |
5408 * A character is a digit if its general category type, provided |
5406 * A character is a digit if its general category type, provided |
5409 * by <code>Character.getType(ch)</code>, is |
5407 * by {@code Character.getType(ch)}, is |
5410 * <code>DECIMAL_DIGIT_NUMBER</code>. |
5408 * {@code DECIMAL_DIGIT_NUMBER}. |
5411 * <p> |
5409 * <p> |
5412 * Some Unicode character ranges that contain digits: |
5410 * Some Unicode character ranges that contain digits: |
5413 * <ul> |
5411 * <ul> |
5414 * <li><code>'\u0030'</code> through <code>'\u0039'</code>, |
5412 * <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'}, |
5415 * ISO-LATIN-1 digits (<code>'0'</code> through <code>'9'</code>) |
5413 * ISO-LATIN-1 digits ({@code '0'} through {@code '9'}) |
5416 * <li><code>'\u0660'</code> through <code>'\u0669'</code>, |
5414 * <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'}, |
5417 * Arabic-Indic digits |
5415 * Arabic-Indic digits |
5418 * <li><code>'\u06F0'</code> through <code>'\u06F9'</code>, |
5416 * <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'}, |
5419 * Extended Arabic-Indic digits |
5417 * Extended Arabic-Indic digits |
5420 * <li><code>'\u0966'</code> through <code>'\u096F'</code>, |
5418 * <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'}, |
5421 * Devanagari digits |
5419 * Devanagari digits |
5422 * <li><code>'\uFF10'</code> through <code>'\uFF19'</code>, |
5420 * <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'}, |
5423 * Fullwidth digits |
5421 * Fullwidth digits |
5424 * </ul> |
5422 * </ul> |
5425 * |
5423 * |
5426 * Many other character ranges contain digits as well. |
5424 * Many other character ranges contain digits as well. |
5427 * |
5425 * |
5429 * href="#supplementary"> supplementary characters</a>. To support |
5427 * href="#supplementary"> supplementary characters</a>. To support |
5430 * all Unicode characters, including supplementary characters, use |
5428 * all Unicode characters, including supplementary characters, use |
5431 * the {@link #isDigit(int)} method. |
5429 * the {@link #isDigit(int)} method. |
5432 * |
5430 * |
5433 * @param ch the character to be tested. |
5431 * @param ch the character to be tested. |
5434 * @return <code>true</code> if the character is a digit; |
5432 * @return {@code true} if the character is a digit; |
5435 * <code>false</code> otherwise. |
5433 * {@code false} otherwise. |
5436 * @see Character#digit(char, int) |
5434 * @see Character#digit(char, int) |
5437 * @see Character#forDigit(int, int) |
5435 * @see Character#forDigit(int, int) |
5438 * @see Character#getType(char) |
5436 * @see Character#getType(char) |
5439 */ |
5437 */ |
5440 public static boolean isDigit(char ch) { |
5438 public static boolean isDigit(char ch) { |
5444 /** |
5442 /** |
5445 * Determines if the specified character (Unicode code point) is a digit. |
5443 * Determines if the specified character (Unicode code point) is a digit. |
5446 * <p> |
5444 * <p> |
5447 * A character is a digit if its general category type, provided |
5445 * A character is a digit if its general category type, provided |
5448 * by {@link Character#getType(int) getType(codePoint)}, is |
5446 * by {@link Character#getType(int) getType(codePoint)}, is |
5449 * <code>DECIMAL_DIGIT_NUMBER</code>. |
5447 * {@code DECIMAL_DIGIT_NUMBER}. |
5450 * <p> |
5448 * <p> |
5451 * Some Unicode character ranges that contain digits: |
5449 * Some Unicode character ranges that contain digits: |
5452 * <ul> |
5450 * <ul> |
5453 * <li><code>'\u0030'</code> through <code>'\u0039'</code>, |
5451 * <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'}, |
5454 * ISO-LATIN-1 digits (<code>'0'</code> through <code>'9'</code>) |
5452 * ISO-LATIN-1 digits ({@code '0'} through {@code '9'}) |
5455 * <li><code>'\u0660'</code> through <code>'\u0669'</code>, |
5453 * <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'}, |
5456 * Arabic-Indic digits |
5454 * Arabic-Indic digits |
5457 * <li><code>'\u06F0'</code> through <code>'\u06F9'</code>, |
5455 * <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'}, |
5458 * Extended Arabic-Indic digits |
5456 * Extended Arabic-Indic digits |
5459 * <li><code>'\u0966'</code> through <code>'\u096F'</code>, |
5457 * <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'}, |
5460 * Devanagari digits |
5458 * Devanagari digits |
5461 * <li><code>'\uFF10'</code> through <code>'\uFF19'</code>, |
5459 * <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'}, |
5462 * Fullwidth digits |
5460 * Fullwidth digits |
5463 * </ul> |
5461 * </ul> |
5464 * |
5462 * |
5465 * Many other character ranges contain digits as well. |
5463 * Many other character ranges contain digits as well. |
5466 * |
5464 * |
5467 * @param codePoint the character (Unicode code point) to be tested. |
5465 * @param codePoint the character (Unicode code point) to be tested. |
5468 * @return <code>true</code> if the character is a digit; |
5466 * @return {@code true} if the character is a digit; |
5469 * <code>false</code> otherwise. |
5467 * {@code false} otherwise. |
5470 * @see Character#forDigit(int, int) |
5468 * @see Character#forDigit(int, int) |
5471 * @see Character#getType(int) |
5469 * @see Character#getType(int) |
5472 * @since 1.5 |
5470 * @since 1.5 |
5473 */ |
5471 */ |
5474 public static boolean isDigit(int codePoint) { |
5472 public static boolean isDigit(int codePoint) { |
5488 * href="#supplementary"> supplementary characters</a>. To support |
5486 * href="#supplementary"> supplementary characters</a>. To support |
5489 * all Unicode characters, including supplementary characters, use |
5487 * all Unicode characters, including supplementary characters, use |
5490 * the {@link #isDefined(int)} method. |
5488 * the {@link #isDefined(int)} method. |
5491 * |
5489 * |
5492 * @param ch the character to be tested |
5490 * @param ch the character to be tested |
5493 * @return <code>true</code> if the character has a defined meaning |
5491 * @return {@code true} if the character has a defined meaning |
5494 * in Unicode; <code>false</code> otherwise. |
5492 * in Unicode; {@code false} otherwise. |
5495 * @see Character#isDigit(char) |
5493 * @see Character#isDigit(char) |
5496 * @see Character#isLetter(char) |
5494 * @see Character#isLetter(char) |
5497 * @see Character#isLetterOrDigit(char) |
5495 * @see Character#isLetterOrDigit(char) |
5498 * @see Character#isLowerCase(char) |
5496 * @see Character#isLowerCase(char) |
5499 * @see Character#isTitleCase(char) |
5497 * @see Character#isTitleCase(char) |
5530 |
5528 |
5531 /** |
5529 /** |
5532 * Determines if the specified character is a letter. |
5530 * Determines if the specified character is a letter. |
5533 * <p> |
5531 * <p> |
5534 * A character is considered to be a letter if its general |
5532 * A character is considered to be a letter if its general |
5535 * category type, provided by <code>Character.getType(ch)</code>, |
5533 * category type, provided by {@code Character.getType(ch)}, |
5536 * is any of the following: |
5534 * is any of the following: |
5537 * <ul> |
5535 * <ul> |
5538 * <li> <code>UPPERCASE_LETTER</code> |
5536 * <li> {@code UPPERCASE_LETTER} |
5539 * <li> <code>LOWERCASE_LETTER</code> |
5537 * <li> {@code LOWERCASE_LETTER} |
5540 * <li> <code>TITLECASE_LETTER</code> |
5538 * <li> {@code TITLECASE_LETTER} |
5541 * <li> <code>MODIFIER_LETTER</code> |
5539 * <li> {@code MODIFIER_LETTER} |
5542 * <li> <code>OTHER_LETTER</code> |
5540 * <li> {@code OTHER_LETTER} |
5543 * </ul> |
5541 * </ul> |
5544 * |
5542 * |
5545 * Not all letters have case. Many characters are |
5543 * Not all letters have case. Many characters are |
5546 * letters but are neither uppercase nor lowercase nor titlecase. |
5544 * letters but are neither uppercase nor lowercase nor titlecase. |
5547 * |
5545 * |
5549 * href="#supplementary"> supplementary characters</a>. To support |
5547 * href="#supplementary"> supplementary characters</a>. To support |
5550 * all Unicode characters, including supplementary characters, use |
5548 * all Unicode characters, including supplementary characters, use |
5551 * the {@link #isLetter(int)} method. |
5549 * the {@link #isLetter(int)} method. |
5552 * |
5550 * |
5553 * @param ch the character to be tested. |
5551 * @param ch the character to be tested. |
5554 * @return <code>true</code> if the character is a letter; |
5552 * @return {@code true} if the character is a letter; |
5555 * <code>false</code> otherwise. |
5553 * {@code false} otherwise. |
5556 * @see Character#isDigit(char) |
5554 * @see Character#isDigit(char) |
5557 * @see Character#isJavaIdentifierStart(char) |
5555 * @see Character#isJavaIdentifierStart(char) |
5558 * @see Character#isJavaLetter(char) |
5556 * @see Character#isJavaLetter(char) |
5559 * @see Character#isJavaLetterOrDigit(char) |
5557 * @see Character#isJavaLetterOrDigit(char) |
5560 * @see Character#isLetterOrDigit(char) |
5558 * @see Character#isLetterOrDigit(char) |
5572 * <p> |
5570 * <p> |
5573 * A character is considered to be a letter if its general |
5571 * A character is considered to be a letter if its general |
5574 * category type, provided by {@link Character#getType(int) getType(codePoint)}, |
5572 * category type, provided by {@link Character#getType(int) getType(codePoint)}, |
5575 * is any of the following: |
5573 * is any of the following: |
5576 * <ul> |
5574 * <ul> |
5577 * <li> <code>UPPERCASE_LETTER</code> |
5575 * <li> {@code UPPERCASE_LETTER} |
5578 * <li> <code>LOWERCASE_LETTER</code> |
5576 * <li> {@code LOWERCASE_LETTER} |
5579 * <li> <code>TITLECASE_LETTER</code> |
5577 * <li> {@code TITLECASE_LETTER} |
5580 * <li> <code>MODIFIER_LETTER</code> |
5578 * <li> {@code MODIFIER_LETTER} |
5581 * <li> <code>OTHER_LETTER</code> |
5579 * <li> {@code OTHER_LETTER} |
5582 * </ul> |
5580 * </ul> |
5583 * |
5581 * |
5584 * Not all letters have case. Many characters are |
5582 * Not all letters have case. Many characters are |
5585 * letters but are neither uppercase nor lowercase nor titlecase. |
5583 * letters but are neither uppercase nor lowercase nor titlecase. |
5586 * |
5584 * |
5587 * @param codePoint the character (Unicode code point) to be tested. |
5585 * @param codePoint the character (Unicode code point) to be tested. |
5588 * @return <code>true</code> if the character is a letter; |
5586 * @return {@code true} if the character is a letter; |
5589 * <code>false</code> otherwise. |
5587 * {@code false} otherwise. |
5590 * @see Character#isDigit(int) |
5588 * @see Character#isDigit(int) |
5591 * @see Character#isJavaIdentifierStart(int) |
5589 * @see Character#isJavaIdentifierStart(int) |
5592 * @see Character#isLetterOrDigit(int) |
5590 * @see Character#isLetterOrDigit(int) |
5593 * @see Character#isLowerCase(int) |
5591 * @see Character#isLowerCase(int) |
5594 * @see Character#isTitleCase(int) |
5592 * @see Character#isTitleCase(int) |
5607 |
5605 |
5608 /** |
5606 /** |
5609 * Determines if the specified character is a letter or digit. |
5607 * Determines if the specified character is a letter or digit. |
5610 * <p> |
5608 * <p> |
5611 * A character is considered to be a letter or digit if either |
5609 * A character is considered to be a letter or digit if either |
5612 * <code>Character.isLetter(char ch)</code> or |
5610 * {@code Character.isLetter(char ch)} or |
5613 * <code>Character.isDigit(char ch)</code> returns |
5611 * {@code Character.isDigit(char ch)} returns |
5614 * <code>true</code> for the character. |
5612 * {@code true} for the character. |
5615 * |
5613 * |
5616 * <p><b>Note:</b> This method cannot handle <a |
5614 * <p><b>Note:</b> This method cannot handle <a |
5617 * href="#supplementary"> supplementary characters</a>. To support |
5615 * href="#supplementary"> supplementary characters</a>. To support |
5618 * all Unicode characters, including supplementary characters, use |
5616 * all Unicode characters, including supplementary characters, use |
5619 * the {@link #isLetterOrDigit(int)} method. |
5617 * the {@link #isLetterOrDigit(int)} method. |
5620 * |
5618 * |
5621 * @param ch the character to be tested. |
5619 * @param ch the character to be tested. |
5622 * @return <code>true</code> if the character is a letter or digit; |
5620 * @return {@code true} if the character is a letter or digit; |
5623 * <code>false</code> otherwise. |
5621 * {@code false} otherwise. |
5624 * @see Character#isDigit(char) |
5622 * @see Character#isDigit(char) |
5625 * @see Character#isJavaIdentifierPart(char) |
5623 * @see Character#isJavaIdentifierPart(char) |
5626 * @see Character#isJavaLetter(char) |
5624 * @see Character#isJavaLetter(char) |
5627 * @see Character#isJavaLetterOrDigit(char) |
5625 * @see Character#isJavaLetterOrDigit(char) |
5628 * @see Character#isLetter(char) |
5626 * @see Character#isLetter(char) |
5637 * Determines if the specified character (Unicode code point) is a letter or digit. |
5635 * Determines if the specified character (Unicode code point) is a letter or digit. |
5638 * <p> |
5636 * <p> |
5639 * A character is considered to be a letter or digit if either |
5637 * A character is considered to be a letter or digit if either |
5640 * {@link #isLetter(int) isLetter(codePoint)} or |
5638 * {@link #isLetter(int) isLetter(codePoint)} or |
5641 * {@link #isDigit(int) isDigit(codePoint)} returns |
5639 * {@link #isDigit(int) isDigit(codePoint)} returns |
5642 * <code>true</code> for the character. |
5640 * {@code true} for the character. |
5643 * |
5641 * |
5644 * @param codePoint the character (Unicode code point) to be tested. |
5642 * @param codePoint the character (Unicode code point) to be tested. |
5645 * @return <code>true</code> if the character is a letter or digit; |
5643 * @return {@code true} if the character is a letter or digit; |
5646 * <code>false</code> otherwise. |
5644 * {@code false} otherwise. |
5647 * @see Character#isDigit(int) |
5645 * @see Character#isDigit(int) |
5648 * @see Character#isJavaIdentifierPart(int) |
5646 * @see Character#isJavaIdentifierPart(int) |
5649 * @see Character#isLetter(int) |
5647 * @see Character#isLetter(int) |
5650 * @see Character#isUnicodeIdentifierPart(int) |
5648 * @see Character#isUnicodeIdentifierPart(int) |
5651 * @since 1.5 |
5649 * @since 1.5 |
5665 * character in a Java identifier. |
5663 * character in a Java identifier. |
5666 * <p> |
5664 * <p> |
5667 * A character may start a Java identifier if and only if |
5665 * A character may start a Java identifier if and only if |
5668 * one of the following is true: |
5666 * one of the following is true: |
5669 * <ul> |
5667 * <ul> |
5670 * <li> {@link #isLetter(char) isLetter(ch)} returns <code>true</code> |
5668 * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true} |
5671 * <li> {@link #getType(char) getType(ch)} returns <code>LETTER_NUMBER</code> |
5669 * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER} |
5672 * <li> ch is a currency symbol (such as "$") |
5670 * <li> {@code ch} is a currency symbol (such as {@code '$'}) |
5673 * <li> ch is a connecting punctuation character (such as "_"). |
5671 * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}). |
5674 * </ul> |
5672 * </ul> |
5675 * |
5673 * |
5676 * @param ch the character to be tested. |
5674 * @param ch the character to be tested. |
5677 * @return <code>true</code> if the character may start a Java |
5675 * @return {@code true} if the character may start a Java |
5678 * identifier; <code>false</code> otherwise. |
5676 * identifier; {@code false} otherwise. |
5679 * @see Character#isJavaLetterOrDigit(char) |
5677 * @see Character#isJavaLetterOrDigit(char) |
5680 * @see Character#isJavaIdentifierStart(char) |
5678 * @see Character#isJavaIdentifierStart(char) |
5681 * @see Character#isJavaIdentifierPart(char) |
5679 * @see Character#isJavaIdentifierPart(char) |
5682 * @see Character#isLetter(char) |
5680 * @see Character#isLetter(char) |
5683 * @see Character#isLetterOrDigit(char) |
5681 * @see Character#isLetterOrDigit(char) |
5696 * <p> |
5694 * <p> |
5697 * A character may be part of a Java identifier if and only if any |
5695 * A character may be part of a Java identifier if and only if any |
5698 * of the following are true: |
5696 * of the following are true: |
5699 * <ul> |
5697 * <ul> |
5700 * <li> it is a letter |
5698 * <li> it is a letter |
5701 * <li> it is a currency symbol (such as <code>'$'</code>) |
5699 * <li> it is a currency symbol (such as {@code '$'}) |
5702 * <li> it is a connecting punctuation character (such as <code>'_'</code>) |
5700 * <li> it is a connecting punctuation character (such as {@code '_'}) |
5703 * <li> it is a digit |
5701 * <li> it is a digit |
5704 * <li> it is a numeric letter (such as a Roman numeral character) |
5702 * <li> it is a numeric letter (such as a Roman numeral character) |
5705 * <li> it is a combining mark |
5703 * <li> it is a combining mark |
5706 * <li> it is a non-spacing mark |
5704 * <li> it is a non-spacing mark |
5707 * <li> <code>isIdentifierIgnorable</code> returns |
5705 * <li> {@code isIdentifierIgnorable} returns |
5708 * <code>true</code> for the character. |
5706 * {@code true} for the character. |
5709 * </ul> |
5707 * </ul> |
5710 * |
5708 * |
5711 * @param ch the character to be tested. |
5709 * @param ch the character to be tested. |
5712 * @return <code>true</code> if the character may be part of a |
5710 * @return {@code true} if the character may be part of a |
5713 * Java identifier; <code>false</code> otherwise. |
5711 * Java identifier; {@code false} otherwise. |
5714 * @see Character#isJavaLetter(char) |
5712 * @see Character#isJavaLetter(char) |
5715 * @see Character#isJavaIdentifierStart(char) |
5713 * @see Character#isJavaIdentifierStart(char) |
5716 * @see Character#isJavaIdentifierPart(char) |
5714 * @see Character#isJavaIdentifierPart(char) |
5717 * @see Character#isLetter(char) |
5715 * @see Character#isLetter(char) |
5718 * @see Character#isLetterOrDigit(char) |
5716 * @see Character#isLetterOrDigit(char) |
5731 * permissible as the first character in a Java identifier. |
5729 * permissible as the first character in a Java identifier. |
5732 * <p> |
5730 * <p> |
5733 * A character may start a Java identifier if and only if |
5731 * A character may start a Java identifier if and only if |
5734 * one of the following conditions is true: |
5732 * one of the following conditions is true: |
5735 * <ul> |
5733 * <ul> |
5736 * <li> {@link #isLetter(char) isLetter(ch)} returns <code>true</code> |
5734 * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true} |
5737 * <li> {@link #getType(char) getType(ch)} returns <code>LETTER_NUMBER</code> |
5735 * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER} |
5738 * <li> ch is a currency symbol (such as "$") |
5736 * <li> {@code ch} is a currency symbol (such as {@code '$'}) |
5739 * <li> ch is a connecting punctuation character (such as "_"). |
5737 * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}). |
5740 * </ul> |
5738 * </ul> |
5741 * |
5739 * |
5742 * <p><b>Note:</b> This method cannot handle <a |
5740 * <p><b>Note:</b> This method cannot handle <a |
5743 * href="#supplementary"> supplementary characters</a>. To support |
5741 * href="#supplementary"> supplementary characters</a>. To support |
5744 * all Unicode characters, including supplementary characters, use |
5742 * all Unicode characters, including supplementary characters, use |
5745 * the {@link #isJavaIdentifierStart(int)} method. |
5743 * the {@link #isJavaIdentifierStart(int)} method. |
5746 * |
5744 * |
5747 * @param ch the character to be tested. |
5745 * @param ch the character to be tested. |
5748 * @return <code>true</code> if the character may start a Java identifier; |
5746 * @return {@code true} if the character may start a Java identifier; |
5749 * <code>false</code> otherwise. |
5747 * {@code false} otherwise. |
5750 * @see Character#isJavaIdentifierPart(char) |
5748 * @see Character#isJavaIdentifierPart(char) |
5751 * @see Character#isLetter(char) |
5749 * @see Character#isLetter(char) |
5752 * @see Character#isUnicodeIdentifierStart(char) |
5750 * @see Character#isUnicodeIdentifierStart(char) |
5753 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) |
5751 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) |
5754 * @since 1.1 |
5752 * @since 1.1 |
5763 * <p> |
5761 * <p> |
5764 * A character may start a Java identifier if and only if |
5762 * A character may start a Java identifier if and only if |
5765 * one of the following conditions is true: |
5763 * one of the following conditions is true: |
5766 * <ul> |
5764 * <ul> |
5767 * <li> {@link #isLetter(int) isLetter(codePoint)} |
5765 * <li> {@link #isLetter(int) isLetter(codePoint)} |
5768 * returns <code>true</code> |
5766 * returns {@code true} |
5769 * <li> {@link #getType(int) getType(codePoint)} |
5767 * <li> {@link #getType(int) getType(codePoint)} |
5770 * returns <code>LETTER_NUMBER</code> |
5768 * returns {@code LETTER_NUMBER} |
5771 * <li> the referenced character is a currency symbol (such as "$") |
5769 * <li> the referenced character is a currency symbol (such as {@code '$'}) |
5772 * <li> the referenced character is a connecting punctuation character |
5770 * <li> the referenced character is a connecting punctuation character |
5773 * (such as "_"). |
5771 * (such as {@code '_'}). |
5774 * </ul> |
5772 * </ul> |
5775 * |
5773 * |
5776 * @param codePoint the character (Unicode code point) to be tested. |
5774 * @param codePoint the character (Unicode code point) to be tested. |
5777 * @return <code>true</code> if the character may start a Java identifier; |
5775 * @return {@code true} if the character may start a Java identifier; |
5778 * <code>false</code> otherwise. |
5776 * {@code false} otherwise. |
5779 * @see Character#isJavaIdentifierPart(int) |
5777 * @see Character#isJavaIdentifierPart(int) |
5780 * @see Character#isLetter(int) |
5778 * @see Character#isLetter(int) |
5781 * @see Character#isUnicodeIdentifierStart(int) |
5779 * @see Character#isUnicodeIdentifierStart(int) |
5782 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) |
5780 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) |
5783 * @since 1.5 |
5781 * @since 1.5 |
5792 * <p> |
5790 * <p> |
5793 * A character may be part of a Java identifier if any of the following |
5791 * A character may be part of a Java identifier if any of the following |
5794 * are true: |
5792 * are true: |
5795 * <ul> |
5793 * <ul> |
5796 * <li> it is a letter |
5794 * <li> it is a letter |
5797 * <li> it is a currency symbol (such as <code>'$'</code>) |
5795 * <li> it is a currency symbol (such as {@code '$'}) |
5798 * <li> it is a connecting punctuation character (such as <code>'_'</code>) |
5796 * <li> it is a connecting punctuation character (such as {@code '_'}) |
5799 * <li> it is a digit |
5797 * <li> it is a digit |
5800 * <li> it is a numeric letter (such as a Roman numeral character) |
5798 * <li> it is a numeric letter (such as a Roman numeral character) |
5801 * <li> it is a combining mark |
5799 * <li> it is a combining mark |
5802 * <li> it is a non-spacing mark |
5800 * <li> it is a non-spacing mark |
5803 * <li> <code>isIdentifierIgnorable</code> returns |
5801 * <li> {@code isIdentifierIgnorable} returns |
5804 * <code>true</code> for the character |
5802 * {@code true} for the character |
5805 * </ul> |
5803 * </ul> |
5806 * |
5804 * |
5807 * <p><b>Note:</b> This method cannot handle <a |
5805 * <p><b>Note:</b> This method cannot handle <a |
5808 * href="#supplementary"> supplementary characters</a>. To support |
5806 * href="#supplementary"> supplementary characters</a>. To support |
5809 * all Unicode characters, including supplementary characters, use |
5807 * all Unicode characters, including supplementary characters, use |
5810 * the {@link #isJavaIdentifierPart(int)} method. |
5808 * the {@link #isJavaIdentifierPart(int)} method. |
5811 * |
5809 * |
5812 * @param ch the character to be tested. |
5810 * @param ch the character to be tested. |
5813 * @return <code>true</code> if the character may be part of a |
5811 * @return {@code true} if the character may be part of a |
5814 * Java identifier; <code>false</code> otherwise. |
5812 * Java identifier; {@code false} otherwise. |
5815 * @see Character#isIdentifierIgnorable(char) |
5813 * @see Character#isIdentifierIgnorable(char) |
5816 * @see Character#isJavaIdentifierStart(char) |
5814 * @see Character#isJavaIdentifierStart(char) |
5817 * @see Character#isLetterOrDigit(char) |
5815 * @see Character#isLetterOrDigit(char) |
5818 * @see Character#isUnicodeIdentifierPart(char) |
5816 * @see Character#isUnicodeIdentifierPart(char) |
5819 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) |
5817 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) |
5829 * <p> |
5827 * <p> |
5830 * A character may be part of a Java identifier if any of the following |
5828 * A character may be part of a Java identifier if any of the following |
5831 * are true: |
5829 * are true: |
5832 * <ul> |
5830 * <ul> |
5833 * <li> it is a letter |
5831 * <li> it is a letter |
5834 * <li> it is a currency symbol (such as <code>'$'</code>) |
5832 * <li> it is a currency symbol (such as {@code '$'}) |
5835 * <li> it is a connecting punctuation character (such as <code>'_'</code>) |
5833 * <li> it is a connecting punctuation character (such as {@code '_'}) |
5836 * <li> it is a digit |
5834 * <li> it is a digit |
5837 * <li> it is a numeric letter (such as a Roman numeral character) |
5835 * <li> it is a numeric letter (such as a Roman numeral character) |
5838 * <li> it is a combining mark |
5836 * <li> it is a combining mark |
5839 * <li> it is a non-spacing mark |
5837 * <li> it is a non-spacing mark |
5840 * <li> {@link #isIdentifierIgnorable(int) |
5838 * <li> {@link #isIdentifierIgnorable(int) |
5841 * isIdentifierIgnorable(codePoint)} returns <code>true</code> for |
5839 * isIdentifierIgnorable(codePoint)} returns {@code true} for |
5842 * the character |
5840 * the character |
5843 * </ul> |
5841 * </ul> |
5844 * |
5842 * |
5845 * @param codePoint the character (Unicode code point) to be tested. |
5843 * @param codePoint the character (Unicode code point) to be tested. |
5846 * @return <code>true</code> if the character may be part of a |
5844 * @return {@code true} if the character may be part of a |
5847 * Java identifier; <code>false</code> otherwise. |
5845 * Java identifier; {@code false} otherwise. |
5848 * @see Character#isIdentifierIgnorable(int) |
5846 * @see Character#isIdentifierIgnorable(int) |
5849 * @see Character#isJavaIdentifierStart(int) |
5847 * @see Character#isJavaIdentifierStart(int) |
5850 * @see Character#isLetterOrDigit(int) |
5848 * @see Character#isLetterOrDigit(int) |
5851 * @see Character#isUnicodeIdentifierPart(int) |
5849 * @see Character#isUnicodeIdentifierPart(int) |
5852 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) |
5850 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) |
5861 * first character in a Unicode identifier. |
5859 * first character in a Unicode identifier. |
5862 * <p> |
5860 * <p> |
5863 * A character may start a Unicode identifier if and only if |
5861 * A character may start a Unicode identifier if and only if |
5864 * one of the following conditions is true: |
5862 * one of the following conditions is true: |
5865 * <ul> |
5863 * <ul> |
5866 * <li> {@link #isLetter(char) isLetter(ch)} returns <code>true</code> |
5864 * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true} |
5867 * <li> {@link #getType(char) getType(ch)} returns |
5865 * <li> {@link #getType(char) getType(ch)} returns |
5868 * <code>LETTER_NUMBER</code>. |
5866 * {@code LETTER_NUMBER}. |
5869 * </ul> |
5867 * </ul> |
5870 * |
5868 * |
5871 * <p><b>Note:</b> This method cannot handle <a |
5869 * <p><b>Note:</b> This method cannot handle <a |
5872 * href="#supplementary"> supplementary characters</a>. To support |
5870 * href="#supplementary"> supplementary characters</a>. To support |
5873 * all Unicode characters, including supplementary characters, use |
5871 * all Unicode characters, including supplementary characters, use |
5874 * the {@link #isUnicodeIdentifierStart(int)} method. |
5872 * the {@link #isUnicodeIdentifierStart(int)} method. |
5875 * |
5873 * |
5876 * @param ch the character to be tested. |
5874 * @param ch the character to be tested. |
5877 * @return <code>true</code> if the character may start a Unicode |
5875 * @return {@code true} if the character may start a Unicode |
5878 * identifier; <code>false</code> otherwise. |
5876 * identifier; {@code false} otherwise. |
5879 * @see Character#isJavaIdentifierStart(char) |
5877 * @see Character#isJavaIdentifierStart(char) |
5880 * @see Character#isLetter(char) |
5878 * @see Character#isLetter(char) |
5881 * @see Character#isUnicodeIdentifierPart(char) |
5879 * @see Character#isUnicodeIdentifierPart(char) |
5882 * @since 1.1 |
5880 * @since 1.1 |
5883 */ |
5881 */ |
5891 * <p> |
5889 * <p> |
5892 * A character may start a Unicode identifier if and only if |
5890 * A character may start a Unicode identifier if and only if |
5893 * one of the following conditions is true: |
5891 * one of the following conditions is true: |
5894 * <ul> |
5892 * <ul> |
5895 * <li> {@link #isLetter(int) isLetter(codePoint)} |
5893 * <li> {@link #isLetter(int) isLetter(codePoint)} |
5896 * returns <code>true</code> |
5894 * returns {@code true} |
5897 * <li> {@link #getType(int) getType(codePoint)} |
5895 * <li> {@link #getType(int) getType(codePoint)} |
5898 * returns <code>LETTER_NUMBER</code>. |
5896 * returns {@code LETTER_NUMBER}. |
5899 * </ul> |
5897 * </ul> |
5900 * @param codePoint the character (Unicode code point) to be tested. |
5898 * @param codePoint the character (Unicode code point) to be tested. |
5901 * @return <code>true</code> if the character may start a Unicode |
5899 * @return {@code true} if the character may start a Unicode |
5902 * identifier; <code>false</code> otherwise. |
5900 * identifier; {@code false} otherwise. |
5903 * @see Character#isJavaIdentifierStart(int) |
5901 * @see Character#isJavaIdentifierStart(int) |
5904 * @see Character#isLetter(int) |
5902 * @see Character#isLetter(int) |
5905 * @see Character#isUnicodeIdentifierPart(int) |
5903 * @see Character#isUnicodeIdentifierPart(int) |
5906 * @since 1.5 |
5904 * @since 1.5 |
5907 */ |
5905 */ |
5915 * <p> |
5913 * <p> |
5916 * A character may be part of a Unicode identifier if and only if |
5914 * A character may be part of a Unicode identifier if and only if |
5917 * one of the following statements is true: |
5915 * one of the following statements is true: |
5918 * <ul> |
5916 * <ul> |
5919 * <li> it is a letter |
5917 * <li> it is a letter |
5920 * <li> it is a connecting punctuation character (such as <code>'_'</code>) |
5918 * <li> it is a connecting punctuation character (such as {@code '_'}) |
5921 * <li> it is a digit |
5919 * <li> it is a digit |
5922 * <li> it is a numeric letter (such as a Roman numeral character) |
5920 * <li> it is a numeric letter (such as a Roman numeral character) |
5923 * <li> it is a combining mark |
5921 * <li> it is a combining mark |
5924 * <li> it is a non-spacing mark |
5922 * <li> it is a non-spacing mark |
5925 * <li> <code>isIdentifierIgnorable</code> returns |
5923 * <li> {@code isIdentifierIgnorable} returns |
5926 * <code>true</code> for this character. |
5924 * {@code true} for this character. |
5927 * </ul> |
5925 * </ul> |
5928 * |
5926 * |
5929 * <p><b>Note:</b> This method cannot handle <a |
5927 * <p><b>Note:</b> This method cannot handle <a |
5930 * href="#supplementary"> supplementary characters</a>. To support |
5928 * href="#supplementary"> supplementary characters</a>. To support |
5931 * all Unicode characters, including supplementary characters, use |
5929 * all Unicode characters, including supplementary characters, use |
5932 * the {@link #isUnicodeIdentifierPart(int)} method. |
5930 * the {@link #isUnicodeIdentifierPart(int)} method. |
5933 * |
5931 * |
5934 * @param ch the character to be tested. |
5932 * @param ch the character to be tested. |
5935 * @return <code>true</code> if the character may be part of a |
5933 * @return {@code true} if the character may be part of a |
5936 * Unicode identifier; <code>false</code> otherwise. |
5934 * Unicode identifier; {@code false} otherwise. |
5937 * @see Character#isIdentifierIgnorable(char) |
5935 * @see Character#isIdentifierIgnorable(char) |
5938 * @see Character#isJavaIdentifierPart(char) |
5936 * @see Character#isJavaIdentifierPart(char) |
5939 * @see Character#isLetterOrDigit(char) |
5937 * @see Character#isLetterOrDigit(char) |
5940 * @see Character#isUnicodeIdentifierStart(char) |
5938 * @see Character#isUnicodeIdentifierStart(char) |
5941 * @since 1.1 |
5939 * @since 1.1 |
5950 * <p> |
5948 * <p> |
5951 * A character may be part of a Unicode identifier if and only if |
5949 * A character may be part of a Unicode identifier if and only if |
5952 * one of the following statements is true: |
5950 * one of the following statements is true: |
5953 * <ul> |
5951 * <ul> |
5954 * <li> it is a letter |
5952 * <li> it is a letter |
5955 * <li> it is a connecting punctuation character (such as <code>'_'</code>) |
5953 * <li> it is a connecting punctuation character (such as {@code '_'}) |
5956 * <li> it is a digit |
5954 * <li> it is a digit |
5957 * <li> it is a numeric letter (such as a Roman numeral character) |
5955 * <li> it is a numeric letter (such as a Roman numeral character) |
5958 * <li> it is a combining mark |
5956 * <li> it is a combining mark |
5959 * <li> it is a non-spacing mark |
5957 * <li> it is a non-spacing mark |
5960 * <li> <code>isIdentifierIgnorable</code> returns |
5958 * <li> {@code isIdentifierIgnorable} returns |
5961 * <code>true</code> for this character. |
5959 * {@code true} for this character. |
5962 * </ul> |
5960 * </ul> |
5963 * @param codePoint the character (Unicode code point) to be tested. |
5961 * @param codePoint the character (Unicode code point) to be tested. |
5964 * @return <code>true</code> if the character may be part of a |
5962 * @return {@code true} if the character may be part of a |
5965 * Unicode identifier; <code>false</code> otherwise. |
5963 * Unicode identifier; {@code false} otherwise. |
5966 * @see Character#isIdentifierIgnorable(int) |
5964 * @see Character#isIdentifierIgnorable(int) |
5967 * @see Character#isJavaIdentifierPart(int) |
5965 * @see Character#isJavaIdentifierPart(int) |
5968 * @see Character#isLetterOrDigit(int) |
5966 * @see Character#isLetterOrDigit(int) |
5969 * @see Character#isUnicodeIdentifierStart(int) |
5967 * @see Character#isUnicodeIdentifierStart(int) |
5970 * @since 1.5 |
5968 * @since 1.5 |
5980 * The following Unicode characters are ignorable in a Java identifier |
5978 * The following Unicode characters are ignorable in a Java identifier |
5981 * or a Unicode identifier: |
5979 * or a Unicode identifier: |
5982 * <ul> |
5980 * <ul> |
5983 * <li>ISO control characters that are not whitespace |
5981 * <li>ISO control characters that are not whitespace |
5984 * <ul> |
5982 * <ul> |
5985 * <li><code>'\u0000'</code> through <code>'\u0008'</code> |
5983 * <li>{@code '\u005Cu0000'} through {@code '\u005Cu0008'} |
5986 * <li><code>'\u000E'</code> through <code>'\u001B'</code> |
5984 * <li>{@code '\u005Cu000E'} through {@code '\u005Cu001B'} |
5987 * <li><code>'\u007F'</code> through <code>'\u009F'</code> |
5985 * <li>{@code '\u005Cu007F'} through {@code '\u005Cu009F'} |
5988 * </ul> |
5986 * </ul> |
5989 * |
5987 * |
5990 * <li>all characters that have the <code>FORMAT</code> general |
5988 * <li>all characters that have the {@code FORMAT} general |
5991 * category value |
5989 * category value |
5992 * </ul> |
5990 * </ul> |
5993 * |
5991 * |
5994 * <p><b>Note:</b> This method cannot handle <a |
5992 * <p><b>Note:</b> This method cannot handle <a |
5995 * href="#supplementary"> supplementary characters</a>. To support |
5993 * href="#supplementary"> supplementary characters</a>. To support |
5996 * all Unicode characters, including supplementary characters, use |
5994 * all Unicode characters, including supplementary characters, use |
5997 * the {@link #isIdentifierIgnorable(int)} method. |
5995 * the {@link #isIdentifierIgnorable(int)} method. |
5998 * |
5996 * |
5999 * @param ch the character to be tested. |
5997 * @param ch the character to be tested. |
6000 * @return <code>true</code> if the character is an ignorable control |
5998 * @return {@code true} if the character is an ignorable control |
6001 * character that may be part of a Java or Unicode identifier; |
5999 * character that may be part of a Java or Unicode identifier; |
6002 * <code>false</code> otherwise. |
6000 * {@code false} otherwise. |
6003 * @see Character#isJavaIdentifierPart(char) |
6001 * @see Character#isJavaIdentifierPart(char) |
6004 * @see Character#isUnicodeIdentifierPart(char) |
6002 * @see Character#isUnicodeIdentifierPart(char) |
6005 * @since 1.1 |
6003 * @since 1.1 |
6006 */ |
6004 */ |
6007 public static boolean isIdentifierIgnorable(char ch) { |
6005 public static boolean isIdentifierIgnorable(char ch) { |
6015 * The following Unicode characters are ignorable in a Java identifier |
6013 * The following Unicode characters are ignorable in a Java identifier |
6016 * or a Unicode identifier: |
6014 * or a Unicode identifier: |
6017 * <ul> |
6015 * <ul> |
6018 * <li>ISO control characters that are not whitespace |
6016 * <li>ISO control characters that are not whitespace |
6019 * <ul> |
6017 * <ul> |
6020 * <li><code>'\u0000'</code> through <code>'\u0008'</code> |
6018 * <li>{@code '\u005Cu0000'} through {@code '\u005Cu0008'} |
6021 * <li><code>'\u000E'</code> through <code>'\u001B'</code> |
6019 * <li>{@code '\u005Cu000E'} through {@code '\u005Cu001B'} |
6022 * <li><code>'\u007F'</code> through <code>'\u009F'</code> |
6020 * <li>{@code '\u005Cu007F'} through {@code '\u005Cu009F'} |
6023 * </ul> |
6021 * </ul> |
6024 * |
6022 * |
6025 * <li>all characters that have the <code>FORMAT</code> general |
6023 * <li>all characters that have the {@code FORMAT} general |
6026 * category value |
6024 * category value |
6027 * </ul> |
6025 * </ul> |
6028 * |
6026 * |
6029 * @param codePoint the character (Unicode code point) to be tested. |
6027 * @param codePoint the character (Unicode code point) to be tested. |
6030 * @return <code>true</code> if the character is an ignorable control |
6028 * @return {@code true} if the character is an ignorable control |
6031 * character that may be part of a Java or Unicode identifier; |
6029 * character that may be part of a Java or Unicode identifier; |
6032 * <code>false</code> otherwise. |
6030 * {@code false} otherwise. |
6033 * @see Character#isJavaIdentifierPart(int) |
6031 * @see Character#isJavaIdentifierPart(int) |
6034 * @see Character#isUnicodeIdentifierPart(int) |
6032 * @see Character#isUnicodeIdentifierPart(int) |
6035 * @since 1.5 |
6033 * @since 1.5 |
6036 */ |
6034 */ |
6037 public static boolean isIdentifierIgnorable(int codePoint) { |
6035 public static boolean isIdentifierIgnorable(int codePoint) { |
6041 /** |
6039 /** |
6042 * Converts the character argument to lowercase using case |
6040 * Converts the character argument to lowercase using case |
6043 * mapping information from the UnicodeData file. |
6041 * mapping information from the UnicodeData file. |
6044 * <p> |
6042 * <p> |
6045 * Note that |
6043 * Note that |
6046 * <code>Character.isLowerCase(Character.toLowerCase(ch))</code> |
6044 * {@code Character.isLowerCase(Character.toLowerCase(ch))} |
6047 * does not always return <code>true</code> for some ranges of |
6045 * does not always return {@code true} for some ranges of |
6048 * characters, particularly those that are symbols or ideographs. |
6046 * characters, particularly those that are symbols or ideographs. |
6049 * |
6047 * |
6050 * <p>In general, {@link String#toLowerCase()} should be used to map |
6048 * <p>In general, {@link String#toLowerCase()} should be used to map |
6051 * characters to lowercase. <code>String</code> case mapping methods |
6049 * characters to lowercase. {@code String} case mapping methods |
6052 * have several benefits over <code>Character</code> case mapping methods. |
6050 * have several benefits over {@code Character} case mapping methods. |
6053 * <code>String</code> case mapping methods can perform locale-sensitive |
6051 * {@code String} case mapping methods can perform locale-sensitive |
6054 * mappings, context-sensitive mappings, and 1:M character mappings, whereas |
6052 * mappings, context-sensitive mappings, and 1:M character mappings, whereas |
6055 * the <code>Character</code> case mapping methods cannot. |
6053 * the {@code Character} case mapping methods cannot. |
6056 * |
6054 * |
6057 * <p><b>Note:</b> This method cannot handle <a |
6055 * <p><b>Note:</b> This method cannot handle <a |
6058 * href="#supplementary"> supplementary characters</a>. To support |
6056 * href="#supplementary"> supplementary characters</a>. To support |
6059 * all Unicode characters, including supplementary characters, use |
6057 * all Unicode characters, including supplementary characters, use |
6060 * the {@link #toLowerCase(int)} method. |
6058 * the {@link #toLowerCase(int)} method. |
6073 * Converts the character (Unicode code point) argument to |
6071 * Converts the character (Unicode code point) argument to |
6074 * lowercase using case mapping information from the UnicodeData |
6072 * lowercase using case mapping information from the UnicodeData |
6075 * file. |
6073 * file. |
6076 * |
6074 * |
6077 * <p> Note that |
6075 * <p> Note that |
6078 * <code>Character.isLowerCase(Character.toLowerCase(codePoint))</code> |
6076 * {@code Character.isLowerCase(Character.toLowerCase(codePoint))} |
6079 * does not always return <code>true</code> for some ranges of |
6077 * does not always return {@code true} for some ranges of |
6080 * characters, particularly those that are symbols or ideographs. |
6078 * characters, particularly those that are symbols or ideographs. |
6081 * |
6079 * |
6082 * <p>In general, {@link String#toLowerCase()} should be used to map |
6080 * <p>In general, {@link String#toLowerCase()} should be used to map |
6083 * characters to lowercase. <code>String</code> case mapping methods |
6081 * characters to lowercase. {@code String} case mapping methods |
6084 * have several benefits over <code>Character</code> case mapping methods. |
6082 * have several benefits over {@code Character} case mapping methods. |
6085 * <code>String</code> case mapping methods can perform locale-sensitive |
6083 * {@code String} case mapping methods can perform locale-sensitive |
6086 * mappings, context-sensitive mappings, and 1:M character mappings, whereas |
6084 * mappings, context-sensitive mappings, and 1:M character mappings, whereas |
6087 * the <code>Character</code> case mapping methods cannot. |
6085 * the {@code Character} case mapping methods cannot. |
6088 * |
6086 * |
6089 * @param codePoint the character (Unicode code point) to be converted. |
6087 * @param codePoint the character (Unicode code point) to be converted. |
6090 * @return the lowercase equivalent of the character (Unicode code |
6088 * @return the lowercase equivalent of the character (Unicode code |
6091 * point), if any; otherwise, the character itself. |
6089 * point), if any; otherwise, the character itself. |
6092 * @see Character#isLowerCase(int) |
6090 * @see Character#isLowerCase(int) |
6101 /** |
6099 /** |
6102 * Converts the character argument to uppercase using case mapping |
6100 * Converts the character argument to uppercase using case mapping |
6103 * information from the UnicodeData file. |
6101 * information from the UnicodeData file. |
6104 * <p> |
6102 * <p> |
6105 * Note that |
6103 * Note that |
6106 * <code>Character.isUpperCase(Character.toUpperCase(ch))</code> |
6104 * {@code Character.isUpperCase(Character.toUpperCase(ch))} |
6107 * does not always return <code>true</code> for some ranges of |
6105 * does not always return {@code true} for some ranges of |
6108 * characters, particularly those that are symbols or ideographs. |
6106 * characters, particularly those that are symbols or ideographs. |
6109 * |
6107 * |
6110 * <p>In general, {@link String#toUpperCase()} should be used to map |
6108 * <p>In general, {@link String#toUpperCase()} should be used to map |
6111 * characters to uppercase. <code>String</code> case mapping methods |
6109 * characters to uppercase. {@code String} case mapping methods |
6112 * have several benefits over <code>Character</code> case mapping methods. |
6110 * have several benefits over {@code Character} case mapping methods. |
6113 * <code>String</code> case mapping methods can perform locale-sensitive |
6111 * {@code String} case mapping methods can perform locale-sensitive |
6114 * mappings, context-sensitive mappings, and 1:M character mappings, whereas |
6112 * mappings, context-sensitive mappings, and 1:M character mappings, whereas |
6115 * the <code>Character</code> case mapping methods cannot. |
6113 * the {@code Character} case mapping methods cannot. |
6116 * |
6114 * |
6117 * <p><b>Note:</b> This method cannot handle <a |
6115 * <p><b>Note:</b> This method cannot handle <a |
6118 * href="#supplementary"> supplementary characters</a>. To support |
6116 * href="#supplementary"> supplementary characters</a>. To support |
6119 * all Unicode characters, including supplementary characters, use |
6117 * all Unicode characters, including supplementary characters, use |
6120 * the {@link #toUpperCase(int)} method. |
6118 * the {@link #toUpperCase(int)} method. |
6133 * Converts the character (Unicode code point) argument to |
6131 * Converts the character (Unicode code point) argument to |
6134 * uppercase using case mapping information from the UnicodeData |
6132 * uppercase using case mapping information from the UnicodeData |
6135 * file. |
6133 * file. |
6136 * |
6134 * |
6137 * <p>Note that |
6135 * <p>Note that |
6138 * <code>Character.isUpperCase(Character.toUpperCase(codePoint))</code> |
6136 * {@code Character.isUpperCase(Character.toUpperCase(codePoint))} |
6139 * does not always return <code>true</code> for some ranges of |
6137 * does not always return {@code true} for some ranges of |
6140 * characters, particularly those that are symbols or ideographs. |
6138 * characters, particularly those that are symbols or ideographs. |
6141 * |
6139 * |
6142 * <p>In general, {@link String#toUpperCase()} should be used to map |
6140 * <p>In general, {@link String#toUpperCase()} should be used to map |
6143 * characters to uppercase. <code>String</code> case mapping methods |
6141 * characters to uppercase. {@code String} case mapping methods |
6144 * have several benefits over <code>Character</code> case mapping methods. |
6142 * have several benefits over {@code Character} case mapping methods. |
6145 * <code>String</code> case mapping methods can perform locale-sensitive |
6143 * {@code String} case mapping methods can perform locale-sensitive |
6146 * mappings, context-sensitive mappings, and 1:M character mappings, whereas |
6144 * mappings, context-sensitive mappings, and 1:M character mappings, whereas |
6147 * the <code>Character</code> case mapping methods cannot. |
6145 * the {@code Character} case mapping methods cannot. |
6148 * |
6146 * |
6149 * @param codePoint the character (Unicode code point) to be converted. |
6147 * @param codePoint the character (Unicode code point) to be converted. |
6150 * @return the uppercase equivalent of the character, if any; |
6148 * @return the uppercase equivalent of the character, if any; |
6151 * otherwise, the character itself. |
6149 * otherwise, the character itself. |
6152 * @see Character#isUpperCase(int) |
6150 * @see Character#isUpperCase(int) |
6162 * Converts the character argument to titlecase using case mapping |
6160 * Converts the character argument to titlecase using case mapping |
6163 * information from the UnicodeData file. If a character has no |
6161 * information from the UnicodeData file. If a character has no |
6164 * explicit titlecase mapping and is not itself a titlecase char |
6162 * explicit titlecase mapping and is not itself a titlecase char |
6165 * according to UnicodeData, then the uppercase mapping is |
6163 * according to UnicodeData, then the uppercase mapping is |
6166 * returned as an equivalent titlecase mapping. If the |
6164 * returned as an equivalent titlecase mapping. If the |
6167 * <code>char</code> argument is already a titlecase |
6165 * {@code char} argument is already a titlecase |
6168 * <code>char</code>, the same <code>char</code> value will be |
6166 * {@code char}, the same {@code char} value will be |
6169 * returned. |
6167 * returned. |
6170 * <p> |
6168 * <p> |
6171 * Note that |
6169 * Note that |
6172 * <code>Character.isTitleCase(Character.toTitleCase(ch))</code> |
6170 * {@code Character.isTitleCase(Character.toTitleCase(ch))} |
6173 * does not always return <code>true</code> for some ranges of |
6171 * does not always return {@code true} for some ranges of |
6174 * characters. |
6172 * characters. |
6175 * |
6173 * |
6176 * <p><b>Note:</b> This method cannot handle <a |
6174 * <p><b>Note:</b> This method cannot handle <a |
6177 * href="#supplementary"> supplementary characters</a>. To support |
6175 * href="#supplementary"> supplementary characters</a>. To support |
6178 * all Unicode characters, including supplementary characters, use |
6176 * all Unicode characters, including supplementary characters, use |
6216 public static int toTitleCase(int codePoint) { |
6214 public static int toTitleCase(int codePoint) { |
6217 return CharacterData.of(codePoint).toTitleCase(codePoint); |
6215 return CharacterData.of(codePoint).toTitleCase(codePoint); |
6218 } |
6216 } |
6219 |
6217 |
6220 /** |
6218 /** |
6221 * Returns the numeric value of the character <code>ch</code> in the |
6219 * Returns the numeric value of the character {@code ch} in the |
6222 * specified radix. |
6220 * specified radix. |
6223 * <p> |
6221 * <p> |
6224 * If the radix is not in the range <code>MIN_RADIX</code> <= |
6222 * If the radix is not in the range {@code MIN_RADIX} ≤ |
6225 * <code>radix</code> <= <code>MAX_RADIX</code> or if the |
6223 * {@code radix} ≤ {@code MAX_RADIX} or if the |
6226 * value of <code>ch</code> is not a valid digit in the specified |
6224 * value of {@code ch} is not a valid digit in the specified |
6227 * radix, <code>-1</code> is returned. A character is a valid digit |
6225 * radix, {@code -1} is returned. A character is a valid digit |
6228 * if at least one of the following is true: |
6226 * if at least one of the following is true: |
6229 * <ul> |
6227 * <ul> |
6230 * <li>The method <code>isDigit</code> is <code>true</code> of the character |
6228 * <li>The method {@code isDigit} is {@code true} of the character |
6231 * and the Unicode decimal digit value of the character (or its |
6229 * and the Unicode decimal digit value of the character (or its |
6232 * single-character decomposition) is less than the specified radix. |
6230 * single-character decomposition) is less than the specified radix. |
6233 * In this case the decimal digit value is returned. |
6231 * In this case the decimal digit value is returned. |
6234 * <li>The character is one of the uppercase Latin letters |
6232 * <li>The character is one of the uppercase Latin letters |
6235 * <code>'A'</code> through <code>'Z'</code> and its code is less than |
6233 * {@code 'A'} through {@code 'Z'} and its code is less than |
6236 * <code>radix + 'A' - 10</code>. |
6234 * {@code radix + 'A' - 10}. |
6237 * In this case, <code>ch - 'A' + 10</code> |
6235 * In this case, {@code ch - 'A' + 10} |
6238 * is returned. |
6236 * is returned. |
6239 * <li>The character is one of the lowercase Latin letters |
6237 * <li>The character is one of the lowercase Latin letters |
6240 * <code>'a'</code> through <code>'z'</code> and its code is less than |
6238 * {@code 'a'} through {@code 'z'} and its code is less than |
6241 * <code>radix + 'a' - 10</code>. |
6239 * {@code radix + 'a' - 10}. |
6242 * In this case, <code>ch - 'a' + 10</code> |
6240 * In this case, {@code ch - 'a' + 10} |
|
6241 * is returned. |
|
6242 * <li>The character is one of the fullwidth uppercase Latin letters A |
|
6243 * ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'}) |
|
6244 * and its code is less than |
|
6245 * {@code radix + '\u005CuFF21' - 10}. |
|
6246 * In this case, {@code ch - '\u005CuFF21' + 10} |
|
6247 * is returned. |
|
6248 * <li>The character is one of the fullwidth lowercase Latin letters a |
|
6249 * ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'}) |
|
6250 * and its code is less than |
|
6251 * {@code radix + '\u005CuFF41' - 10}. |
|
6252 * In this case, {@code ch - '\u005CuFF41' + 10} |
6243 * is returned. |
6253 * is returned. |
6244 * </ul> |
6254 * </ul> |
6245 * |
6255 * |
6246 * <p><b>Note:</b> This method cannot handle <a |
6256 * <p><b>Note:</b> This method cannot handle <a |
6247 * href="#supplementary"> supplementary characters</a>. To support |
6257 * href="#supplementary"> supplementary characters</a>. To support |
6261 |
6271 |
6262 /** |
6272 /** |
6263 * Returns the numeric value of the specified character (Unicode |
6273 * Returns the numeric value of the specified character (Unicode |
6264 * code point) in the specified radix. |
6274 * code point) in the specified radix. |
6265 * |
6275 * |
6266 * <p>If the radix is not in the range <code>MIN_RADIX</code> <= |
6276 * <p>If the radix is not in the range {@code MIN_RADIX} ≤ |
6267 * <code>radix</code> <= <code>MAX_RADIX</code> or if the |
6277 * {@code radix} ≤ {@code MAX_RADIX} or if the |
6268 * character is not a valid digit in the specified |
6278 * character is not a valid digit in the specified |
6269 * radix, <code>-1</code> is returned. A character is a valid digit |
6279 * radix, {@code -1} is returned. A character is a valid digit |
6270 * if at least one of the following is true: |
6280 * if at least one of the following is true: |
6271 * <ul> |
6281 * <ul> |
6272 * <li>The method {@link #isDigit(int) isDigit(codePoint)} is <code>true</code> of the character |
6282 * <li>The method {@link #isDigit(int) isDigit(codePoint)} is {@code true} of the character |
6273 * and the Unicode decimal digit value of the character (or its |
6283 * and the Unicode decimal digit value of the character (or its |
6274 * single-character decomposition) is less than the specified radix. |
6284 * single-character decomposition) is less than the specified radix. |
6275 * In this case the decimal digit value is returned. |
6285 * In this case the decimal digit value is returned. |
6276 * <li>The character is one of the uppercase Latin letters |
6286 * <li>The character is one of the uppercase Latin letters |
6277 * <code>'A'</code> through <code>'Z'</code> and its code is less than |
6287 * {@code 'A'} through {@code 'Z'} and its code is less than |
6278 * <code>radix + 'A' - 10</code>. |
6288 * {@code radix + 'A' - 10}. |
6279 * In this case, <code>ch - 'A' + 10</code> |
6289 * In this case, {@code codePoint - 'A' + 10} |
6280 * is returned. |
6290 * is returned. |
6281 * <li>The character is one of the lowercase Latin letters |
6291 * <li>The character is one of the lowercase Latin letters |
6282 * <code>'a'</code> through <code>'z'</code> and its code is less than |
6292 * {@code 'a'} through {@code 'z'} and its code is less than |
6283 * <code>radix + 'a' - 10</code>. |
6293 * {@code radix + 'a' - 10}. |
6284 * In this case, <code>ch - 'a' + 10</code> |
6294 * In this case, {@code codePoint - 'a' + 10} |
|
6295 * is returned. |
|
6296 * <li>The character is one of the fullwidth uppercase Latin letters A |
|
6297 * ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'}) |
|
6298 * and its code is less than |
|
6299 * {@code radix + '\u005CuFF21' - 10}. |
|
6300 * In this case, |
|
6301 * {@code codePoint - '\u005CuFF21' + 10} |
|
6302 * is returned. |
|
6303 * <li>The character is one of the fullwidth lowercase Latin letters a |
|
6304 * ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'}) |
|
6305 * and its code is less than |
|
6306 * {@code radix + '\u005CuFF41'- 10}. |
|
6307 * In this case, |
|
6308 * {@code codePoint - '\u005CuFF41' + 10} |
6285 * is returned. |
6309 * is returned. |
6286 * </ul> |
6310 * </ul> |
6287 * |
6311 * |
6288 * @param codePoint the character (Unicode code point) to be converted. |
6312 * @param codePoint the character (Unicode code point) to be converted. |
6289 * @param radix the radix. |
6313 * @param radix the radix. |
6296 public static int digit(int codePoint, int radix) { |
6320 public static int digit(int codePoint, int radix) { |
6297 return CharacterData.of(codePoint).digit(codePoint, radix); |
6321 return CharacterData.of(codePoint).digit(codePoint, radix); |
6298 } |
6322 } |
6299 |
6323 |
6300 /** |
6324 /** |
6301 * Returns the <code>int</code> value that the specified Unicode |
6325 * Returns the {@code int} value that the specified Unicode |
6302 * character represents. For example, the character |
6326 * character represents. For example, the character |
6303 * <code>'\u216C'</code> (the roman numeral fifty) will return |
6327 * {@code '\u005Cu216C'} (the roman numeral fifty) will return |
6304 * an int with a value of 50. |
6328 * an int with a value of 50. |
6305 * <p> |
6329 * <p> |
6306 * The letters A-Z in their uppercase (<code>'\u0041'</code> through |
6330 * The letters A-Z in their uppercase ({@code '\u005Cu0041'} through |
6307 * <code>'\u005A'</code>), lowercase |
6331 * {@code '\u005Cu005A'}), lowercase |
6308 * (<code>'\u0061'</code> through <code>'\u007A'</code>), and |
6332 * ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and |
6309 * full width variant (<code>'\uFF21'</code> through |
6333 * full width variant ({@code '\u005CuFF21'} through |
6310 * <code>'\uFF3A'</code> and <code>'\uFF41'</code> through |
6334 * {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through |
6311 * <code>'\uFF5A'</code>) forms have numeric values from 10 |
6335 * {@code '\u005CuFF5A'}) forms have numeric values from 10 |
6312 * through 35. This is independent of the Unicode specification, |
6336 * through 35. This is independent of the Unicode specification, |
6313 * which does not assign numeric values to these <code>char</code> |
6337 * which does not assign numeric values to these {@code char} |
6314 * values. |
6338 * values. |
6315 * <p> |
6339 * <p> |
6316 * If the character does not have a numeric value, then -1 is returned. |
6340 * If the character does not have a numeric value, then -1 is returned. |
6317 * If the character has a numeric value that cannot be represented as a |
6341 * If the character has a numeric value that cannot be represented as a |
6318 * nonnegative integer (for example, a fractional value), then -2 |
6342 * nonnegative integer (for example, a fractional value), then -2 |
6322 * href="#supplementary"> supplementary characters</a>. To support |
6346 * href="#supplementary"> supplementary characters</a>. To support |
6323 * all Unicode characters, including supplementary characters, use |
6347 * all Unicode characters, including supplementary characters, use |
6324 * the {@link #getNumericValue(int)} method. |
6348 * the {@link #getNumericValue(int)} method. |
6325 * |
6349 * |
6326 * @param ch the character to be converted. |
6350 * @param ch the character to be converted. |
6327 * @return the numeric value of the character, as a nonnegative <code>int</code> |
6351 * @return the numeric value of the character, as a nonnegative {@code int} |
6328 * value; -2 if the character has a numeric value that is not a |
6352 * value; -2 if the character has a numeric value that is not a |
6329 * nonnegative integer; -1 if the character has no numeric value. |
6353 * nonnegative integer; -1 if the character has no numeric value. |
6330 * @see Character#forDigit(int, int) |
6354 * @see Character#forDigit(int, int) |
6331 * @see Character#isDigit(char) |
6355 * @see Character#isDigit(char) |
6332 * @since 1.1 |
6356 * @since 1.1 |
6334 public static int getNumericValue(char ch) { |
6358 public static int getNumericValue(char ch) { |
6335 return getNumericValue((int)ch); |
6359 return getNumericValue((int)ch); |
6336 } |
6360 } |
6337 |
6361 |
6338 /** |
6362 /** |
6339 * Returns the <code>int</code> value that the specified |
6363 * Returns the {@code int} value that the specified |
6340 * character (Unicode code point) represents. For example, the character |
6364 * character (Unicode code point) represents. For example, the character |
6341 * <code>'\u216C'</code> (the Roman numeral fifty) will return |
6365 * {@code '\u005Cu216C'} (the Roman numeral fifty) will return |
6342 * an <code>int</code> with a value of 50. |
6366 * an {@code int} with a value of 50. |
6343 * <p> |
6367 * <p> |
6344 * The letters A-Z in their uppercase (<code>'\u0041'</code> through |
6368 * The letters A-Z in their uppercase ({@code '\u005Cu0041'} through |
6345 * <code>'\u005A'</code>), lowercase |
6369 * {@code '\u005Cu005A'}), lowercase |
6346 * (<code>'\u0061'</code> through <code>'\u007A'</code>), and |
6370 * ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and |
6347 * full width variant (<code>'\uFF21'</code> through |
6371 * full width variant ({@code '\u005CuFF21'} through |
6348 * <code>'\uFF3A'</code> and <code>'\uFF41'</code> through |
6372 * {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through |
6349 * <code>'\uFF5A'</code>) forms have numeric values from 10 |
6373 * {@code '\u005CuFF5A'}) forms have numeric values from 10 |
6350 * through 35. This is independent of the Unicode specification, |
6374 * through 35. This is independent of the Unicode specification, |
6351 * which does not assign numeric values to these <code>char</code> |
6375 * which does not assign numeric values to these {@code char} |
6352 * values. |
6376 * values. |
6353 * <p> |
6377 * <p> |
6354 * If the character does not have a numeric value, then -1 is returned. |
6378 * If the character does not have a numeric value, then -1 is returned. |
6355 * If the character has a numeric value that cannot be represented as a |
6379 * If the character has a numeric value that cannot be represented as a |
6356 * nonnegative integer (for example, a fractional value), then -2 |
6380 * nonnegative integer (for example, a fractional value), then -2 |
6357 * is returned. |
6381 * is returned. |
6358 * |
6382 * |
6359 * @param codePoint the character (Unicode code point) to be converted. |
6383 * @param codePoint the character (Unicode code point) to be converted. |
6360 * @return the numeric value of the character, as a nonnegative <code>int</code> |
6384 * @return the numeric value of the character, as a nonnegative {@code int} |
6361 * value; -2 if the character has a numeric value that is not a |
6385 * value; -2 if the character has a numeric value that is not a |
6362 * nonnegative integer; -1 if the character has no numeric value. |
6386 * nonnegative integer; -1 if the character has no numeric value. |
6363 * @see Character#forDigit(int, int) |
6387 * @see Character#forDigit(int, int) |
6364 * @see Character#isDigit(int) |
6388 * @see Character#isDigit(int) |
6365 * @since 1.5 |
6389 * @since 1.5 |
6368 return CharacterData.of(codePoint).getNumericValue(codePoint); |
6392 return CharacterData.of(codePoint).getNumericValue(codePoint); |
6369 } |
6393 } |
6370 |
6394 |
6371 /** |
6395 /** |
6372 * Determines if the specified character is ISO-LATIN-1 white space. |
6396 * Determines if the specified character is ISO-LATIN-1 white space. |
6373 * This method returns <code>true</code> for the following five |
6397 * This method returns {@code true} for the following five |
6374 * characters only: |
6398 * characters only: |
6375 * <table> |
6399 * <table> |
6376 * <tr><td><code>'\t'</code></td> <td><code>U+0009</code></td> |
6400 * <tr><td>{@code '\t'}</td> <td>{@code U+0009}</td> |
6377 * <td><code>HORIZONTAL TABULATION</code></td></tr> |
6401 * <td>{@code HORIZONTAL TABULATION}</td></tr> |
6378 * <tr><td><code>'\n'</code></td> <td><code>U+000A</code></td> |
6402 * <tr><td>{@code '\n'}</td> <td>{@code U+000A}</td> |
6379 * <td><code>NEW LINE</code></td></tr> |
6403 * <td>{@code NEW LINE}</td></tr> |
6380 * <tr><td><code>'\f'</code></td> <td><code>U+000C</code></td> |
6404 * <tr><td>{@code '\f'}</td> <td>{@code U+000C}</td> |
6381 * <td><code>FORM FEED</code></td></tr> |
6405 * <td>{@code FORM FEED}</td></tr> |
6382 * <tr><td><code>'\r'</code></td> <td><code>U+000D</code></td> |
6406 * <tr><td>{@code '\r'}</td> <td>{@code U+000D}</td> |
6383 * <td><code>CARRIAGE RETURN</code></td></tr> |
6407 * <td>{@code CARRIAGE RETURN}</td></tr> |
6384 * <tr><td><code>' '</code></td> <td><code>U+0020</code></td> |
6408 * <tr><td>{@code ' '}</td> <td>{@code U+0020}</td> |
6385 * <td><code>SPACE</code></td></tr> |
6409 * <td>{@code SPACE}</td></tr> |
6386 * </table> |
6410 * </table> |
6387 * |
6411 * |
6388 * @param ch the character to be tested. |
6412 * @param ch the character to be tested. |
6389 * @return <code>true</code> if the character is ISO-LATIN-1 white |
6413 * @return {@code true} if the character is ISO-LATIN-1 white |
6390 * space; <code>false</code> otherwise. |
6414 * space; {@code false} otherwise. |
6391 * @see Character#isSpaceChar(char) |
6415 * @see Character#isSpaceChar(char) |
6392 * @see Character#isWhitespace(char) |
6416 * @see Character#isWhitespace(char) |
6393 * @deprecated Replaced by isWhitespace(char). |
6417 * @deprecated Replaced by isWhitespace(char). |
6394 */ |
6418 */ |
6395 @Deprecated |
6419 @Deprecated |
6408 * A character is considered to be a space character if and only if |
6432 * A character is considered to be a space character if and only if |
6409 * it is specified to be a space character by the Unicode standard. This |
6433 * it is specified to be a space character by the Unicode standard. This |
6410 * method returns true if the character's general category type is any of |
6434 * method returns true if the character's general category type is any of |
6411 * the following: |
6435 * the following: |
6412 * <ul> |
6436 * <ul> |
6413 * <li> <code>SPACE_SEPARATOR</code> |
6437 * <li> {@code SPACE_SEPARATOR} |
6414 * <li> <code>LINE_SEPARATOR</code> |
6438 * <li> {@code LINE_SEPARATOR} |
6415 * <li> <code>PARAGRAPH_SEPARATOR</code> |
6439 * <li> {@code PARAGRAPH_SEPARATOR} |
6416 * </ul> |
6440 * </ul> |
6417 * |
6441 * |
6418 * <p><b>Note:</b> This method cannot handle <a |
6442 * <p><b>Note:</b> This method cannot handle <a |
6419 * href="#supplementary"> supplementary characters</a>. To support |
6443 * href="#supplementary"> supplementary characters</a>. To support |
6420 * all Unicode characters, including supplementary characters, use |
6444 * all Unicode characters, including supplementary characters, use |
6421 * the {@link #isSpaceChar(int)} method. |
6445 * the {@link #isSpaceChar(int)} method. |
6422 * |
6446 * |
6423 * @param ch the character to be tested. |
6447 * @param ch the character to be tested. |
6424 * @return <code>true</code> if the character is a space character; |
6448 * @return {@code true} if the character is a space character; |
6425 * <code>false</code> otherwise. |
6449 * {@code false} otherwise. |
6426 * @see Character#isWhitespace(char) |
6450 * @see Character#isWhitespace(char) |
6427 * @since 1.1 |
6451 * @since 1.1 |
6428 */ |
6452 */ |
6429 public static boolean isSpaceChar(char ch) { |
6453 public static boolean isSpaceChar(char ch) { |
6430 return isSpaceChar((int)ch); |
6454 return isSpaceChar((int)ch); |
6459 /** |
6483 /** |
6460 * Determines if the specified character is white space according to Java. |
6484 * Determines if the specified character is white space according to Java. |
6461 * A character is a Java whitespace character if and only if it satisfies |
6485 * A character is a Java whitespace character if and only if it satisfies |
6462 * one of the following criteria: |
6486 * one of the following criteria: |
6463 * <ul> |
6487 * <ul> |
6464 * <li> It is a Unicode space character (<code>SPACE_SEPARATOR</code>, |
6488 * <li> It is a Unicode space character ({@code SPACE_SEPARATOR}, |
6465 * <code>LINE_SEPARATOR</code>, or <code>PARAGRAPH_SEPARATOR</code>) |
6489 * {@code LINE_SEPARATOR}, or {@code PARAGRAPH_SEPARATOR}) |
6466 * but is not also a non-breaking space (<code>'\u00A0'</code>, |
6490 * but is not also a non-breaking space ({@code '\u005Cu00A0'}, |
6467 * <code>'\u2007'</code>, <code>'\u202F'</code>). |
6491 * {@code '\u005Cu2007'}, {@code '\u005Cu202F'}). |
6468 * <li> It is <code>'\t'</code>, U+0009 HORIZONTAL TABULATION. |
6492 * <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION. |
6469 * <li> It is <code>'\n'</code>, U+000A LINE FEED. |
6493 * <li> It is {@code '\u005Cn'}, U+000A LINE FEED. |
6470 * <li> It is <code>'\u000B'</code>, U+000B VERTICAL TABULATION. |
6494 * <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION. |
6471 * <li> It is <code>'\f'</code>, U+000C FORM FEED. |
6495 * <li> It is {@code '\u005Cf'}, U+000C FORM FEED. |
6472 * <li> It is <code>'\r'</code>, U+000D CARRIAGE RETURN. |
6496 * <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN. |
6473 * <li> It is <code>'\u001C'</code>, U+001C FILE SEPARATOR. |
6497 * <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR. |
6474 * <li> It is <code>'\u001D'</code>, U+001D GROUP SEPARATOR. |
6498 * <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR. |
6475 * <li> It is <code>'\u001E'</code>, U+001E RECORD SEPARATOR. |
6499 * <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR. |
6476 * <li> It is <code>'\u001F'</code>, U+001F UNIT SEPARATOR. |
6500 * <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR. |
6477 * </ul> |
6501 * </ul> |
6478 * |
6502 * |
6479 * <p><b>Note:</b> This method cannot handle <a |
6503 * <p><b>Note:</b> This method cannot handle <a |
6480 * href="#supplementary"> supplementary characters</a>. To support |
6504 * href="#supplementary"> supplementary characters</a>. To support |
6481 * all Unicode characters, including supplementary characters, use |
6505 * all Unicode characters, including supplementary characters, use |
6482 * the {@link #isWhitespace(int)} method. |
6506 * the {@link #isWhitespace(int)} method. |
6483 * |
6507 * |
6484 * @param ch the character to be tested. |
6508 * @param ch the character to be tested. |
6485 * @return <code>true</code> if the character is a Java whitespace |
6509 * @return {@code true} if the character is a Java whitespace |
6486 * character; <code>false</code> otherwise. |
6510 * character; {@code false} otherwise. |
6487 * @see Character#isSpaceChar(char) |
6511 * @see Character#isSpaceChar(char) |
6488 * @since 1.1 |
6512 * @since 1.1 |
6489 */ |
6513 */ |
6490 public static boolean isWhitespace(char ch) { |
6514 public static boolean isWhitespace(char ch) { |
6491 return isWhitespace((int)ch); |
6515 return isWhitespace((int)ch); |
6497 * whitespace character if and only if it satisfies one of the |
6521 * whitespace character if and only if it satisfies one of the |
6498 * following criteria: |
6522 * following criteria: |
6499 * <ul> |
6523 * <ul> |
6500 * <li> It is a Unicode space character ({@link #SPACE_SEPARATOR}, |
6524 * <li> It is a Unicode space character ({@link #SPACE_SEPARATOR}, |
6501 * {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR}) |
6525 * {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR}) |
6502 * but is not also a non-breaking space (<code>'\u00A0'</code>, |
6526 * but is not also a non-breaking space ({@code '\u005Cu00A0'}, |
6503 * <code>'\u2007'</code>, <code>'\u202F'</code>). |
6527 * {@code '\u005Cu2007'}, {@code '\u005Cu202F'}). |
6504 * <li> It is <code>'\t'</code>, U+0009 HORIZONTAL TABULATION. |
6528 * <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION. |
6505 * <li> It is <code>'\n'</code>, U+000A LINE FEED. |
6529 * <li> It is {@code '\u005Cn'}, U+000A LINE FEED. |
6506 * <li> It is <code>'\u000B'</code>, U+000B VERTICAL TABULATION. |
6530 * <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION. |
6507 * <li> It is <code>'\f'</code>, U+000C FORM FEED. |
6531 * <li> It is {@code '\u005Cf'}, U+000C FORM FEED. |
6508 * <li> It is <code>'\r'</code>, U+000D CARRIAGE RETURN. |
6532 * <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN. |
6509 * <li> It is <code>'\u001C'</code>, U+001C FILE SEPARATOR. |
6533 * <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR. |
6510 * <li> It is <code>'\u001D'</code>, U+001D GROUP SEPARATOR. |
6534 * <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR. |
6511 * <li> It is <code>'\u001E'</code>, U+001E RECORD SEPARATOR. |
6535 * <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR. |
6512 * <li> It is <code>'\u001F'</code>, U+001F UNIT SEPARATOR. |
6536 * <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR. |
6513 * </ul> |
6537 * </ul> |
6514 * <p> |
6538 * <p> |
6515 * |
6539 * |
6516 * @param codePoint the character (Unicode code point) to be tested. |
6540 * @param codePoint the character (Unicode code point) to be tested. |
6517 * @return <code>true</code> if the character is a Java whitespace |
6541 * @return {@code true} if the character is a Java whitespace |
6518 * character; <code>false</code> otherwise. |
6542 * character; {@code false} otherwise. |
6519 * @see Character#isSpaceChar(int) |
6543 * @see Character#isSpaceChar(int) |
6520 * @since 1.5 |
6544 * @since 1.5 |
6521 */ |
6545 */ |
6522 public static boolean isWhitespace(int codePoint) { |
6546 public static boolean isWhitespace(int codePoint) { |
6523 return CharacterData.of(codePoint).isWhitespace(codePoint); |
6547 return CharacterData.of(codePoint).isWhitespace(codePoint); |
6524 } |
6548 } |
6525 |
6549 |
6526 /** |
6550 /** |
6527 * Determines if the specified character is an ISO control |
6551 * Determines if the specified character is an ISO control |
6528 * character. A character is considered to be an ISO control |
6552 * character. A character is considered to be an ISO control |
6529 * character if its code is in the range <code>'\u0000'</code> |
6553 * character if its code is in the range {@code '\u005Cu0000'} |
6530 * through <code>'\u001F'</code> or in the range |
6554 * through {@code '\u005Cu001F'} or in the range |
6531 * <code>'\u007F'</code> through <code>'\u009F'</code>. |
6555 * {@code '\u005Cu007F'} through {@code '\u005Cu009F'}. |
6532 * |
6556 * |
6533 * <p><b>Note:</b> This method cannot handle <a |
6557 * <p><b>Note:</b> This method cannot handle <a |
6534 * href="#supplementary"> supplementary characters</a>. To support |
6558 * href="#supplementary"> supplementary characters</a>. To support |
6535 * all Unicode characters, including supplementary characters, use |
6559 * all Unicode characters, including supplementary characters, use |
6536 * the {@link #isISOControl(int)} method. |
6560 * the {@link #isISOControl(int)} method. |
6537 * |
6561 * |
6538 * @param ch the character to be tested. |
6562 * @param ch the character to be tested. |
6539 * @return <code>true</code> if the character is an ISO control character; |
6563 * @return {@code true} if the character is an ISO control character; |
6540 * <code>false</code> otherwise. |
6564 * {@code false} otherwise. |
6541 * |
6565 * |
6542 * @see Character#isSpaceChar(char) |
6566 * @see Character#isSpaceChar(char) |
6543 * @see Character#isWhitespace(char) |
6567 * @see Character#isWhitespace(char) |
6544 * @since 1.1 |
6568 * @since 1.1 |
6545 */ |
6569 */ |
6548 } |
6572 } |
6549 |
6573 |
6550 /** |
6574 /** |
6551 * Determines if the referenced character (Unicode code point) is an ISO control |
6575 * Determines if the referenced character (Unicode code point) is an ISO control |
6552 * character. A character is considered to be an ISO control |
6576 * character. A character is considered to be an ISO control |
6553 * character if its code is in the range <code>'\u0000'</code> |
6577 * character if its code is in the range {@code '\u005Cu0000'} |
6554 * through <code>'\u001F'</code> or in the range |
6578 * through {@code '\u005Cu001F'} or in the range |
6555 * <code>'\u007F'</code> through <code>'\u009F'</code>. |
6579 * {@code '\u005Cu007F'} through {@code '\u005Cu009F'}. |
6556 * |
6580 * |
6557 * @param codePoint the character (Unicode code point) to be tested. |
6581 * @param codePoint the character (Unicode code point) to be tested. |
6558 * @return <code>true</code> if the character is an ISO control character; |
6582 * @return {@code true} if the character is an ISO control character; |
6559 * <code>false</code> otherwise. |
6583 * {@code false} otherwise. |
6560 * @see Character#isSpaceChar(int) |
6584 * @see Character#isSpaceChar(int) |
6561 * @see Character#isWhitespace(int) |
6585 * @see Character#isWhitespace(int) |
6562 * @since 1.5 |
6586 * @since 1.5 |
6563 */ |
6587 */ |
6564 public static boolean isISOControl(int codePoint) { |
6588 public static boolean isISOControl(int codePoint) { |
6576 * href="#supplementary"> supplementary characters</a>. To support |
6600 * href="#supplementary"> supplementary characters</a>. To support |
6577 * all Unicode characters, including supplementary characters, use |
6601 * all Unicode characters, including supplementary characters, use |
6578 * the {@link #getType(int)} method. |
6602 * the {@link #getType(int)} method. |
6579 * |
6603 * |
6580 * @param ch the character to be tested. |
6604 * @param ch the character to be tested. |
6581 * @return a value of type <code>int</code> representing the |
6605 * @return a value of type {@code int} representing the |
6582 * character's general category. |
6606 * character's general category. |
6583 * @see Character#COMBINING_SPACING_MARK |
6607 * @see Character#COMBINING_SPACING_MARK |
6584 * @see Character#CONNECTOR_PUNCTUATION |
6608 * @see Character#CONNECTOR_PUNCTUATION |
6585 * @see Character#CONTROL |
6609 * @see Character#CONTROL |
6586 * @see Character#CURRENCY_SYMBOL |
6610 * @see Character#CURRENCY_SYMBOL |
6658 return CharacterData.of(codePoint).getType(codePoint); |
6682 return CharacterData.of(codePoint).getType(codePoint); |
6659 } |
6683 } |
6660 |
6684 |
6661 /** |
6685 /** |
6662 * Determines the character representation for a specific digit in |
6686 * Determines the character representation for a specific digit in |
6663 * the specified radix. If the value of <code>radix</code> is not a |
6687 * the specified radix. If the value of {@code radix} is not a |
6664 * valid radix, or the value of <code>digit</code> is not a valid |
6688 * valid radix, or the value of {@code digit} is not a valid |
6665 * digit in the specified radix, the null character |
6689 * digit in the specified radix, the null character |
6666 * (<code>'\u0000'</code>) is returned. |
6690 * ({@code '\u005Cu0000'}) is returned. |
6667 * <p> |
6691 * <p> |
6668 * The <code>radix</code> argument is valid if it is greater than or |
6692 * The {@code radix} argument is valid if it is greater than or |
6669 * equal to <code>MIN_RADIX</code> and less than or equal to |
6693 * equal to {@code MIN_RADIX} and less than or equal to |
6670 * <code>MAX_RADIX</code>. The <code>digit</code> argument is valid if |
6694 * {@code MAX_RADIX}. The {@code digit} argument is valid if |
6671 * <code>0 <=digit < radix</code>. |
6695 * {@code 0 <= digit < radix}. |
6672 * <p> |
6696 * <p> |
6673 * If the digit is less than 10, then |
6697 * If the digit is less than 10, then |
6674 * <code>'0' + digit</code> is returned. Otherwise, the value |
6698 * {@code '0' + digit} is returned. Otherwise, the value |
6675 * <code>'a' + digit - 10</code> is returned. |
6699 * {@code 'a' + digit - 10} is returned. |
6676 * |
6700 * |
6677 * @param digit the number to convert to a character. |
6701 * @param digit the number to convert to a character. |
6678 * @param radix the radix. |
6702 * @param radix the radix. |
6679 * @return the <code>char</code> representation of the specified digit |
6703 * @return the {@code char} representation of the specified digit |
6680 * in the specified radix. |
6704 * in the specified radix. |
6681 * @see Character#MIN_RADIX |
6705 * @see Character#MIN_RADIX |
6682 * @see Character#MAX_RADIX |
6706 * @see Character#MAX_RADIX |
6683 * @see Character#digit(char, int) |
6707 * @see Character#digit(char, int) |
6684 */ |
6708 */ |
6697 |
6721 |
6698 /** |
6722 /** |
6699 * Returns the Unicode directionality property for the given |
6723 * Returns the Unicode directionality property for the given |
6700 * character. Character directionality is used to calculate the |
6724 * character. Character directionality is used to calculate the |
6701 * visual ordering of text. The directionality value of undefined |
6725 * visual ordering of text. The directionality value of undefined |
6702 * <code>char</code> values is <code>DIRECTIONALITY_UNDEFINED</code>. |
6726 * {@code char} values is {@code DIRECTIONALITY_UNDEFINED}. |
6703 * |
6727 * |
6704 * <p><b>Note:</b> This method cannot handle <a |
6728 * <p><b>Note:</b> This method cannot handle <a |
6705 * href="#supplementary"> supplementary characters</a>. To support |
6729 * href="#supplementary"> supplementary characters</a>. To support |
6706 * all Unicode characters, including supplementary characters, use |
6730 * all Unicode characters, including supplementary characters, use |
6707 * the {@link #getDirectionality(int)} method. |
6731 * the {@link #getDirectionality(int)} method. |
6708 * |
6732 * |
6709 * @param ch <code>char</code> for which the directionality property |
6733 * @param ch {@code char} for which the directionality property |
6710 * is requested. |
6734 * is requested. |
6711 * @return the directionality property of the <code>char</code> value. |
6735 * @return the directionality property of the {@code char} value. |
6712 * |
6736 * |
6713 * @see Character#DIRECTIONALITY_UNDEFINED |
6737 * @see Character#DIRECTIONALITY_UNDEFINED |
6714 * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT |
6738 * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT |
6715 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT |
6739 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT |
6716 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC |
6740 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC |
6775 |
6799 |
6776 /** |
6800 /** |
6777 * Determines whether the character is mirrored according to the |
6801 * Determines whether the character is mirrored according to the |
6778 * Unicode specification. Mirrored characters should have their |
6802 * Unicode specification. Mirrored characters should have their |
6779 * glyphs horizontally mirrored when displayed in text that is |
6803 * glyphs horizontally mirrored when displayed in text that is |
6780 * right-to-left. For example, <code>'\u0028'</code> LEFT |
6804 * right-to-left. For example, {@code '\u005Cu0028'} LEFT |
6781 * PARENTHESIS is semantically defined to be an <i>opening |
6805 * PARENTHESIS is semantically defined to be an <i>opening |
6782 * parenthesis</i>. This will appear as a "(" in text that is |
6806 * parenthesis</i>. This will appear as a "(" in text that is |
6783 * left-to-right but as a ")" in text that is right-to-left. |
6807 * left-to-right but as a ")" in text that is right-to-left. |
6784 * |
6808 * |
6785 * <p><b>Note:</b> This method cannot handle <a |
6809 * <p><b>Note:</b> This method cannot handle <a |
6786 * href="#supplementary"> supplementary characters</a>. To support |
6810 * href="#supplementary"> supplementary characters</a>. To support |
6787 * all Unicode characters, including supplementary characters, use |
6811 * all Unicode characters, including supplementary characters, use |
6788 * the {@link #isMirrored(int)} method. |
6812 * the {@link #isMirrored(int)} method. |
6789 * |
6813 * |
6790 * @param ch <code>char</code> for which the mirrored property is requested |
6814 * @param ch {@code char} for which the mirrored property is requested |
6791 * @return <code>true</code> if the char is mirrored, <code>false</code> |
6815 * @return {@code true} if the char is mirrored, {@code false} |
6792 * if the <code>char</code> is not mirrored or is not defined. |
6816 * if the {@code char} is not mirrored or is not defined. |
6793 * @since 1.4 |
6817 * @since 1.4 |
6794 */ |
6818 */ |
6795 public static boolean isMirrored(char ch) { |
6819 public static boolean isMirrored(char ch) { |
6796 return isMirrored((int)ch); |
6820 return isMirrored((int)ch); |
6797 } |
6821 } |
6799 /** |
6823 /** |
6800 * Determines whether the specified character (Unicode code point) |
6824 * Determines whether the specified character (Unicode code point) |
6801 * is mirrored according to the Unicode specification. Mirrored |
6825 * is mirrored according to the Unicode specification. Mirrored |
6802 * characters should have their glyphs horizontally mirrored when |
6826 * characters should have their glyphs horizontally mirrored when |
6803 * displayed in text that is right-to-left. For example, |
6827 * displayed in text that is right-to-left. For example, |
6804 * <code>'\u0028'</code> LEFT PARENTHESIS is semantically |
6828 * {@code '\u005Cu0028'} LEFT PARENTHESIS is semantically |
6805 * defined to be an <i>opening parenthesis</i>. This will appear |
6829 * defined to be an <i>opening parenthesis</i>. This will appear |
6806 * as a "(" in text that is left-to-right but as a ")" in text |
6830 * as a "(" in text that is left-to-right but as a ")" in text |
6807 * that is right-to-left. |
6831 * that is right-to-left. |
6808 * |
6832 * |
6809 * @param codePoint the character (Unicode code point) to be tested. |
6833 * @param codePoint the character (Unicode code point) to be tested. |
6810 * @return <code>true</code> if the character is mirrored, <code>false</code> |
6834 * @return {@code true} if the character is mirrored, {@code false} |
6811 * if the character is not mirrored or is not defined. |
6835 * if the character is not mirrored or is not defined. |
6812 * @since 1.5 |
6836 * @since 1.5 |
6813 */ |
6837 */ |
6814 public static boolean isMirrored(int codePoint) { |
6838 public static boolean isMirrored(int codePoint) { |
6815 return CharacterData.of(codePoint).isMirrored(codePoint); |
6839 return CharacterData.of(codePoint).isMirrored(codePoint); |
6816 } |
6840 } |
6817 |
6841 |
6818 /** |
6842 /** |
6819 * Compares two <code>Character</code> objects numerically. |
6843 * Compares two {@code Character} objects numerically. |
6820 * |
6844 * |
6821 * @param anotherCharacter the <code>Character</code> to be compared. |
6845 * @param anotherCharacter the {@code Character} to be compared. |
6822 |
6846 |
6823 * @return the value <code>0</code> if the argument <code>Character</code> |
6847 * @return the value {@code 0} if the argument {@code Character} |
6824 * is equal to this <code>Character</code>; a value less than |
6848 * is equal to this {@code Character}; a value less than |
6825 * <code>0</code> if this <code>Character</code> is numerically less |
6849 * {@code 0} if this {@code Character} is numerically less |
6826 * than the <code>Character</code> argument; and a value greater than |
6850 * than the {@code Character} argument; and a value greater than |
6827 * <code>0</code> if this <code>Character</code> is numerically greater |
6851 * {@code 0} if this {@code Character} is numerically greater |
6828 * than the <code>Character</code> argument (unsigned comparison). |
6852 * than the {@code Character} argument (unsigned comparison). |
6829 * Note that this is strictly a numerical comparison; it is not |
6853 * Note that this is strictly a numerical comparison; it is not |
6830 * locale-dependent. |
6854 * locale-dependent. |
6831 * @since 1.2 |
6855 * @since 1.2 |
6832 */ |
6856 */ |
6833 public int compareTo(Character anotherCharacter) { |
6857 public int compareTo(Character anotherCharacter) { |
6874 |
6898 |
6875 /** |
6899 /** |
6876 * Converts the character (Unicode code point) argument to uppercase using case |
6900 * Converts the character (Unicode code point) argument to uppercase using case |
6877 * mapping information from the SpecialCasing file in the Unicode |
6901 * mapping information from the SpecialCasing file in the Unicode |
6878 * specification. If a character has no explicit uppercase |
6902 * specification. If a character has no explicit uppercase |
6879 * mapping, then the <code>char</code> itself is returned in the |
6903 * mapping, then the {@code char} itself is returned in the |
6880 * <code>char[]</code>. |
6904 * {@code char[]}. |
6881 * |
6905 * |
6882 * @param codePoint the character (Unicode code point) to be converted. |
6906 * @param codePoint the character (Unicode code point) to be converted. |
6883 * @return a <code>char[]</code> with the uppercased character. |
6907 * @return a {@code char[]} with the uppercased character. |
6884 * @since 1.4 |
6908 * @since 1.4 |
6885 */ |
6909 */ |
6886 static char[] toUpperCaseCharArray(int codePoint) { |
6910 static char[] toUpperCaseCharArray(int codePoint) { |
6887 // As of Unicode 4.0, 1:M uppercasings only happen in the BMP. |
6911 // As of Unicode 4.0, 1:M uppercasings only happen in the BMP. |
6888 assert isBmpCodePoint(codePoint); |
6912 assert isBmpCodePoint(codePoint); |
6909 return (char) (((ch & 0xFF00) >> 8) | (ch << 8)); |
6933 return (char) (((ch & 0xFF00) >> 8) | (ch << 8)); |
6910 } |
6934 } |
6911 |
6935 |
6912 /** |
6936 /** |
6913 * Returns the Unicode name of the specified character |
6937 * Returns the Unicode name of the specified character |
6914 * <code>codePoint</code>, or null if the code point is |
6938 * {@code codePoint}, or null if the code point is |
6915 * {@link #UNASSIGNED unassigned}. |
6939 * {@link #UNASSIGNED unassigned}. |
6916 * <p> |
6940 * <p> |
6917 * Note: if the specified character is not assigned a name by |
6941 * Note: if the specified character is not assigned a name by |
6918 * the <i>UnicodeData</i> file (part of the Unicode Character |
6942 * the <i>UnicodeData</i> file (part of the Unicode Character |
6919 * Database maintained by the Unicode Consortium), the returned |
6943 * Database maintained by the Unicode Consortium), the returned |
6920 * name is the same as the result of expression |
6944 * name is the same as the result of expression |
6921 * |
6945 * |
6922 * <blockquote><code> |
6946 * <blockquote>{@code |
6923 * Character.UnicodeBlock.of(codePoint) |
6947 * Character.UnicodeBlock.of(codePoint).toString().replace('_', ' ') |
6924 * .toString() |
|
6925 * .replace('_', ' ') |
|
6926 * + " " |
6948 * + " " |
6927 * + Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH); |
6949 * + Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH); |
6928 * |
6950 * |
6929 * </code></blockquote> |
6951 * }</blockquote> |
6930 * |
6952 * |
6931 * @param codePoint the character (Unicode code point) |
6953 * @param codePoint the character (Unicode code point) |
6932 * |
6954 * |
6933 * @return the Unicode name of the specified character, or null if |
6955 * @return the Unicode name of the specified character, or null if |
6934 * the code point is unassigned. |
6956 * the code point is unassigned. |
6935 * |
6957 * |
6936 * @exception IllegalArgumentException if the specified |
6958 * @exception IllegalArgumentException if the specified |
6937 * <code>codePoint</code> is not a valid Unicode |
6959 * {@code codePoint} is not a valid Unicode |
6938 * code point. |
6960 * code point. |
6939 * |
6961 * |
6940 * @since 1.7 |
6962 * @since 1.7 |
6941 */ |
6963 */ |
6942 public static String getName(int codePoint) { |
6964 public static String getName(int codePoint) { |