jdk-sandbox: jdk/src/share/classes/sun/text/normalizer/UnicodeSet.java@f0f53bbe5bd1 (annotated)

2 90ce3da70b43 Initial load duke parents: diff changeset	1	/*
5506 202f599c92aa 6943119: Rebrand source copyright notices ohair parents: 2497 diff changeset	2	* Copyright (c) 2005, 2009, Oracle and/or its affiliates. All rights reserved.
2 90ce3da70b43 Initial load duke parents: diff changeset	3	* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
90ce3da70b43 Initial load duke parents: diff changeset	4	*
90ce3da70b43 Initial load duke parents: diff changeset	5	* This code is free software; you can redistribute it and/or modify it
90ce3da70b43 Initial load duke parents: diff changeset	6	* under the terms of the GNU General Public License version 2 only, as
5506 202f599c92aa 6943119: Rebrand source copyright notices ohair parents: 2497 diff changeset	7	* published by the Free Software Foundation. Oracle designates this
2 90ce3da70b43 Initial load duke parents: diff changeset	8	* particular file as subject to the "Classpath" exception as provided
5506 202f599c92aa 6943119: Rebrand source copyright notices ohair parents: 2497 diff changeset	9	* by Oracle in the LICENSE file that accompanied this code.
2 90ce3da70b43 Initial load duke parents: diff changeset	10	*
90ce3da70b43 Initial load duke parents: diff changeset	11	* This code is distributed in the hope that it will be useful, but WITHOUT
90ce3da70b43 Initial load duke parents: diff changeset	12	* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
90ce3da70b43 Initial load duke parents: diff changeset	13	* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
90ce3da70b43 Initial load duke parents: diff changeset	14	* version 2 for more details (a copy is included in the LICENSE file that
90ce3da70b43 Initial load duke parents: diff changeset	15	* accompanied this code).
90ce3da70b43 Initial load duke parents: diff changeset	16	*
90ce3da70b43 Initial load duke parents: diff changeset	17	* You should have received a copy of the GNU General Public License version
90ce3da70b43 Initial load duke parents: diff changeset	18	* 2 along with this work; if not, write to the Free Software Foundation,
90ce3da70b43 Initial load duke parents: diff changeset	19	* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
90ce3da70b43 Initial load duke parents: diff changeset	20	*
5506 202f599c92aa 6943119: Rebrand source copyright notices ohair parents: 2497 diff changeset	21	* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
202f599c92aa 6943119: Rebrand source copyright notices ohair parents: 2497 diff changeset	22	* or visit www.oracle.com if you need additional information or have any
202f599c92aa 6943119: Rebrand source copyright notices ohair parents: 2497 diff changeset	23	* questions.
2 90ce3da70b43 Initial load duke parents: diff changeset	24	*/
90ce3da70b43 Initial load duke parents: diff changeset	25	/*
90ce3da70b43 Initial load duke parents: diff changeset	26	*******************************************************************************
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	27	* (C) Copyright IBM Corp. and others, 1996-2009 - All Rights Reserved *
2 90ce3da70b43 Initial load duke parents: diff changeset	28	* *
90ce3da70b43 Initial load duke parents: diff changeset	29	* The original version of this source code and documentation is copyrighted *
90ce3da70b43 Initial load duke parents: diff changeset	30	* and owned by IBM, These materials are provided under terms of a License *
90ce3da70b43 Initial load duke parents: diff changeset	31	* Agreement between IBM and Sun. This technology is protected by multiple *
90ce3da70b43 Initial load duke parents: diff changeset	32	* US and International patents. This notice and attribution to IBM may not *
90ce3da70b43 Initial load duke parents: diff changeset	33	* to removed. *
90ce3da70b43 Initial load duke parents: diff changeset	34	*******************************************************************************
90ce3da70b43 Initial load duke parents: diff changeset	35	*/
90ce3da70b43 Initial load duke parents: diff changeset	36
90ce3da70b43 Initial load duke parents: diff changeset	37	package sun.text.normalizer;
90ce3da70b43 Initial load duke parents: diff changeset	38
90ce3da70b43 Initial load duke parents: diff changeset	39	import java.text.ParsePosition;
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	40	import java.util.Iterator;
2 90ce3da70b43 Initial load duke parents: diff changeset	41	import java.util.TreeSet;
90ce3da70b43 Initial load duke parents: diff changeset	42
90ce3da70b43 Initial load duke parents: diff changeset	43	/**
90ce3da70b43 Initial load duke parents: diff changeset	44	* A mutable set of Unicode characters and multicharacter strings. Objects of this class
90ce3da70b43 Initial load duke parents: diff changeset	45	* represent <em>character classes</em> used in regular expressions.
90ce3da70b43 Initial load duke parents: diff changeset	46	* A character specifies a subset of Unicode code points. Legal
90ce3da70b43 Initial load duke parents: diff changeset	47	* code points are U+0000 to U+10FFFF, inclusive.
90ce3da70b43 Initial load duke parents: diff changeset	48	*
90ce3da70b43 Initial load duke parents: diff changeset	49	* <p>The UnicodeSet class is not designed to be subclassed.
90ce3da70b43 Initial load duke parents: diff changeset	50	*
90ce3da70b43 Initial load duke parents: diff changeset	51	* <p><code>UnicodeSet</code> supports two APIs. The first is the
90ce3da70b43 Initial load duke parents: diff changeset	52	* <em>operand</em> API that allows the caller to modify the value of
90ce3da70b43 Initial load duke parents: diff changeset	53	* a <code>UnicodeSet</code> object. It conforms to Java 2's
90ce3da70b43 Initial load duke parents: diff changeset	54	* <code>java.util.Set</code> interface, although
90ce3da70b43 Initial load duke parents: diff changeset	55	* <code>UnicodeSet</code> does not actually implement that
90ce3da70b43 Initial load duke parents: diff changeset	56	* interface. All methods of <code>Set</code> are supported, with the
90ce3da70b43 Initial load duke parents: diff changeset	57	* modification that they take a character range or single character
90ce3da70b43 Initial load duke parents: diff changeset	58	* instead of an <code>Object</code>, and they take a
90ce3da70b43 Initial load duke parents: diff changeset	59	* <code>UnicodeSet</code> instead of a <code>Collection</code>. The
90ce3da70b43 Initial load duke parents: diff changeset	60	* operand API may be thought of in terms of boolean logic: a boolean
90ce3da70b43 Initial load duke parents: diff changeset	61	* OR is implemented by <code>add</code>, a boolean AND is implemented
90ce3da70b43 Initial load duke parents: diff changeset	62	* by <code>retain</code>, a boolean XOR is implemented by
90ce3da70b43 Initial load duke parents: diff changeset	63	* <code>complement</code> taking an argument, and a boolean NOT is
90ce3da70b43 Initial load duke parents: diff changeset	64	* implemented by <code>complement</code> with no argument. In terms
90ce3da70b43 Initial load duke parents: diff changeset	65	* of traditional set theory function names, <code>add</code> is a
90ce3da70b43 Initial load duke parents: diff changeset	66	* union, <code>retain</code> is an intersection, <code>remove</code>
90ce3da70b43 Initial load duke parents: diff changeset	67	* is an asymmetric difference, and <code>complement</code> with no
90ce3da70b43 Initial load duke parents: diff changeset	68	* argument is a set complement with respect to the superset range
90ce3da70b43 Initial load duke parents: diff changeset	69	* <code>MIN_VALUE-MAX_VALUE</code>
90ce3da70b43 Initial load duke parents: diff changeset	70	*
90ce3da70b43 Initial load duke parents: diff changeset	71	* <p>The second API is the
90ce3da70b43 Initial load duke parents: diff changeset	72	* <code>applyPattern()</code>/<code>toPattern()</code> API from the
90ce3da70b43 Initial load duke parents: diff changeset	73	* <code>java.text.Format</code>-derived classes. Unlike the
90ce3da70b43 Initial load duke parents: diff changeset	74	* methods that add characters, add categories, and control the logic
90ce3da70b43 Initial load duke parents: diff changeset	75	* of the set, the method <code>applyPattern()</code> sets all
90ce3da70b43 Initial load duke parents: diff changeset	76	* attributes of a <code>UnicodeSet</code> at once, based on a
90ce3da70b43 Initial load duke parents: diff changeset	77	* string pattern.
90ce3da70b43 Initial load duke parents: diff changeset	78	*
90ce3da70b43 Initial load duke parents: diff changeset	79	* <p><b>Pattern syntax</b></p>
90ce3da70b43 Initial load duke parents: diff changeset	80	*
90ce3da70b43 Initial load duke parents: diff changeset	81	* Patterns are accepted by the constructors and the
90ce3da70b43 Initial load duke parents: diff changeset	82	* <code>applyPattern()</code> methods and returned by the
90ce3da70b43 Initial load duke parents: diff changeset	83	* <code>toPattern()</code> method. These patterns follow a syntax
90ce3da70b43 Initial load duke parents: diff changeset	84	* similar to that employed by version 8 regular expression character
90ce3da70b43 Initial load duke parents: diff changeset	85	* classes. Here are some simple examples:
90ce3da70b43 Initial load duke parents: diff changeset	86	*
90ce3da70b43 Initial load duke parents: diff changeset	87	* <blockquote>
90ce3da70b43 Initial load duke parents: diff changeset	88	* <table>
90ce3da70b43 Initial load duke parents: diff changeset	89	* <tr align="top">
90ce3da70b43 Initial load duke parents: diff changeset	90	* <td nowrap valign="top" align="left"><code>[]</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	91	* <td valign="top">No characters</td>
90ce3da70b43 Initial load duke parents: diff changeset	92	* </tr><tr align="top">
90ce3da70b43 Initial load duke parents: diff changeset	93	* <td nowrap valign="top" align="left"><code>[a]</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	94	* <td valign="top">The character 'a'</td>
90ce3da70b43 Initial load duke parents: diff changeset	95	* </tr><tr align="top">
90ce3da70b43 Initial load duke parents: diff changeset	96	* <td nowrap valign="top" align="left"><code>[ae]</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	97	* <td valign="top">The characters 'a' and 'e'</td>
90ce3da70b43 Initial load duke parents: diff changeset	98	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	99	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	100	* <td nowrap valign="top" align="left"><code>[a-e]</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	101	* <td valign="top">The characters 'a' through 'e' inclusive, in Unicode code
90ce3da70b43 Initial load duke parents: diff changeset	102	* point order</td>
90ce3da70b43 Initial load duke parents: diff changeset	103	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	104	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	105	* <td nowrap valign="top" align="left"><code>[\\u4E01]</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	106	* <td valign="top">The character U+4E01</td>
90ce3da70b43 Initial load duke parents: diff changeset	107	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	108	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	109	* <td nowrap valign="top" align="left"><code>[a{ab}{ac}]</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	110	* <td valign="top">The character 'a' and the multicharacter strings "ab" and
90ce3da70b43 Initial load duke parents: diff changeset	111	* "ac"</td>
90ce3da70b43 Initial load duke parents: diff changeset	112	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	113	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	114	* <td nowrap valign="top" align="left"><code>[\p{Lu}]</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	115	* <td valign="top">All characters in the general category Uppercase Letter</td>
90ce3da70b43 Initial load duke parents: diff changeset	116	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	117	* </table>
90ce3da70b43 Initial load duke parents: diff changeset	118	* </blockquote>
90ce3da70b43 Initial load duke parents: diff changeset	119	*
90ce3da70b43 Initial load duke parents: diff changeset	120	* Any character may be preceded by a backslash in order to remove any special
90ce3da70b43 Initial load duke parents: diff changeset	121	* meaning. White space characters, as defined by UCharacterProperty.isRuleWhiteSpace(), are
90ce3da70b43 Initial load duke parents: diff changeset	122	* ignored, unless they are escaped.
90ce3da70b43 Initial load duke parents: diff changeset	123	*
90ce3da70b43 Initial load duke parents: diff changeset	124	* <p>Property patterns specify a set of characters having a certain
90ce3da70b43 Initial load duke parents: diff changeset	125	* property as defined by the Unicode standard. Both the POSIX-like
90ce3da70b43 Initial load duke parents: diff changeset	126	* "[:Lu:]" and the Perl-like syntax "\p{Lu}" are recognized. For a
90ce3da70b43 Initial load duke parents: diff changeset	127	* complete list of supported property patterns, see the User's Guide
90ce3da70b43 Initial load duke parents: diff changeset	128	* for UnicodeSet at
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	129	* <a href="http://www.icu-project.org/userguide/unicodeSet.html">
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	130	* http://www.icu-project.org/userguide/unicodeSet.html</a>.
2 90ce3da70b43 Initial load duke parents: diff changeset	131	* Actual determination of property data is defined by the underlying
90ce3da70b43 Initial load duke parents: diff changeset	132	* Unicode database as implemented by UCharacter.
90ce3da70b43 Initial load duke parents: diff changeset	133	*
90ce3da70b43 Initial load duke parents: diff changeset	134	* <p>Patterns specify individual characters, ranges of characters, and
90ce3da70b43 Initial load duke parents: diff changeset	135	* Unicode property sets. When elements are concatenated, they
90ce3da70b43 Initial load duke parents: diff changeset	136	* specify their union. To complement a set, place a '^' immediately
90ce3da70b43 Initial load duke parents: diff changeset	137	* after the opening '['. Property patterns are inverted by modifying
90ce3da70b43 Initial load duke parents: diff changeset	138	* their delimiters; "[:^foo]" and "\P{foo}". In any other location,
90ce3da70b43 Initial load duke parents: diff changeset	139	* '^' has no special meaning.
90ce3da70b43 Initial load duke parents: diff changeset	140	*
90ce3da70b43 Initial load duke parents: diff changeset	141	* <p>Ranges are indicated by placing two a '-' between two
90ce3da70b43 Initial load duke parents: diff changeset	142	* characters, as in "a-z". This specifies the range of all
90ce3da70b43 Initial load duke parents: diff changeset	143	* characters from the left to the right, in Unicode order. If the
90ce3da70b43 Initial load duke parents: diff changeset	144	* left character is greater than or equal to the
90ce3da70b43 Initial load duke parents: diff changeset	145	* right character it is a syntax error. If a '-' occurs as the first
90ce3da70b43 Initial load duke parents: diff changeset	146	* character after the opening '[' or '[^', or if it occurs as the
90ce3da70b43 Initial load duke parents: diff changeset	147	* last character before the closing ']', then it is taken as a
90ce3da70b43 Initial load duke parents: diff changeset	148	* literal. Thus "[a\\-b]", "[-ab]", and "[ab-]" all indicate the same
90ce3da70b43 Initial load duke parents: diff changeset	149	* set of three characters, 'a', 'b', and '-'.
90ce3da70b43 Initial load duke parents: diff changeset	150	*
90ce3da70b43 Initial load duke parents: diff changeset	151	* <p>Sets may be intersected using the '&' operator or the asymmetric
90ce3da70b43 Initial load duke parents: diff changeset	152	* set difference may be taken using the '-' operator, for example,
90ce3da70b43 Initial load duke parents: diff changeset	153	* "[[:L:]&[\\u0000-\\u0FFF]]" indicates the set of all Unicode letters
90ce3da70b43 Initial load duke parents: diff changeset	154	* with values less than 4096. Operators ('&' and '\|') have equal
90ce3da70b43 Initial load duke parents: diff changeset	155	* precedence and bind left-to-right. Thus
90ce3da70b43 Initial load duke parents: diff changeset	156	* "[[:L:]-[a-z]-[\\u0100-\\u01FF]]" is equivalent to
90ce3da70b43 Initial load duke parents: diff changeset	157	* "[[[:L:]-[a-z]]-[\\u0100-\\u01FF]]". This only really matters for
90ce3da70b43 Initial load duke parents: diff changeset	158	* difference; intersection is commutative.
90ce3da70b43 Initial load duke parents: diff changeset	159	*
90ce3da70b43 Initial load duke parents: diff changeset	160	* <table>
90ce3da70b43 Initial load duke parents: diff changeset	161	* <tr valign=top><td nowrap><code>[a]</code><td>The set containing 'a'
90ce3da70b43 Initial load duke parents: diff changeset	162	* <tr valign=top><td nowrap><code>[a-z]</code><td>The set containing 'a'
90ce3da70b43 Initial load duke parents: diff changeset	163	* through 'z' and all letters in between, in Unicode order
90ce3da70b43 Initial load duke parents: diff changeset	164	* <tr valign=top><td nowrap><code>[^a-z]</code><td>The set containing
90ce3da70b43 Initial load duke parents: diff changeset	165	* all characters but 'a' through 'z',
90ce3da70b43 Initial load duke parents: diff changeset	166	* that is, U+0000 through 'a'-1 and 'z'+1 through U+10FFFF
90ce3da70b43 Initial load duke parents: diff changeset	167	* <tr valign=top><td nowrap><code>[[<em>pat1</em>][<em>pat2</em>]]</code>
90ce3da70b43 Initial load duke parents: diff changeset	168	* <td>The union of sets specified by <em>pat1</em> and <em>pat2</em>
90ce3da70b43 Initial load duke parents: diff changeset	169	* <tr valign=top><td nowrap><code>[[<em>pat1</em>]&[<em>pat2</em>]]</code>
90ce3da70b43 Initial load duke parents: diff changeset	170	* <td>The intersection of sets specified by <em>pat1</em> and <em>pat2</em>
90ce3da70b43 Initial load duke parents: diff changeset	171	* <tr valign=top><td nowrap><code>[[<em>pat1</em>]-[<em>pat2</em>]]</code>
90ce3da70b43 Initial load duke parents: diff changeset	172	* <td>The asymmetric difference of sets specified by <em>pat1</em> and
90ce3da70b43 Initial load duke parents: diff changeset	173	* <em>pat2</em>
90ce3da70b43 Initial load duke parents: diff changeset	174	* <tr valign=top><td nowrap><code>[:Lu:] or \p{Lu}</code>
90ce3da70b43 Initial load duke parents: diff changeset	175	* <td>The set of characters having the specified
90ce3da70b43 Initial load duke parents: diff changeset	176	* Unicode property; in
90ce3da70b43 Initial load duke parents: diff changeset	177	* this case, Unicode uppercase letters
90ce3da70b43 Initial load duke parents: diff changeset	178	* <tr valign=top><td nowrap><code>[:^Lu:] or \P{Lu}</code>
90ce3da70b43 Initial load duke parents: diff changeset	179	* <td>The set of characters <em>not</em> having the given
90ce3da70b43 Initial load duke parents: diff changeset	180	* Unicode property
90ce3da70b43 Initial load duke parents: diff changeset	181	* </table>
90ce3da70b43 Initial load duke parents: diff changeset	182	*
90ce3da70b43 Initial load duke parents: diff changeset	183	* <p><b>Warning</b>: you cannot add an empty string ("") to a UnicodeSet.</p>
90ce3da70b43 Initial load duke parents: diff changeset	184	*
90ce3da70b43 Initial load duke parents: diff changeset	185	* <p><b>Formal syntax</b></p>
90ce3da70b43 Initial load duke parents: diff changeset	186	*
90ce3da70b43 Initial load duke parents: diff changeset	187	* <blockquote>
90ce3da70b43 Initial load duke parents: diff changeset	188	* <table>
90ce3da70b43 Initial load duke parents: diff changeset	189	* <tr align="top">
90ce3da70b43 Initial load duke parents: diff changeset	190	* <td nowrap valign="top" align="right"><code>pattern :=  </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	191	* <td valign="top"><code>('[' '^'? item* ']') \|
90ce3da70b43 Initial load duke parents: diff changeset	192	* property</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	193	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	194	* <tr align="top">
90ce3da70b43 Initial load duke parents: diff changeset	195	* <td nowrap valign="top" align="right"><code>item :=  </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	196	* <td valign="top"><code>char \| (char '-' char) \| pattern-expr<br>
90ce3da70b43 Initial load duke parents: diff changeset	197	* </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	198	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	199	* <tr align="top">
90ce3da70b43 Initial load duke parents: diff changeset	200	* <td nowrap valign="top" align="right"><code>pattern-expr :=  </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	201	* <td valign="top"><code>pattern \| pattern-expr pattern \|
90ce3da70b43 Initial load duke parents: diff changeset	202	* pattern-expr op pattern<br>
90ce3da70b43 Initial load duke parents: diff changeset	203	* </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	204	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	205	* <tr align="top">
90ce3da70b43 Initial load duke parents: diff changeset	206	* <td nowrap valign="top" align="right"><code>op :=  </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	207	* <td valign="top"><code>'&' \| '-'<br>
90ce3da70b43 Initial load duke parents: diff changeset	208	* </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	209	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	210	* <tr align="top">
90ce3da70b43 Initial load duke parents: diff changeset	211	* <td nowrap valign="top" align="right"><code>special :=  </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	212	* <td valign="top"><code>'[' \| ']' \| '-'<br>
90ce3da70b43 Initial load duke parents: diff changeset	213	* </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	214	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	215	* <tr align="top">
90ce3da70b43 Initial load duke parents: diff changeset	216	* <td nowrap valign="top" align="right"><code>char :=  </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	217	* <td valign="top"><em>any character that is not</em><code> special<br>
90ce3da70b43 Initial load duke parents: diff changeset	218	* \| ('\\' </code><em>any character</em><code>)<br>
90ce3da70b43 Initial load duke parents: diff changeset	219	* \| ('\u' hex hex hex hex)<br>
90ce3da70b43 Initial load duke parents: diff changeset	220	* </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	221	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	222	* <tr align="top">
90ce3da70b43 Initial load duke parents: diff changeset	223	* <td nowrap valign="top" align="right"><code>hex :=  </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	224	* <td valign="top"><em>any character for which
90ce3da70b43 Initial load duke parents: diff changeset	225	* </em><code>Character.digit(c, 16)</code><em>
90ce3da70b43 Initial load duke parents: diff changeset	226	* returns a non-negative result</em></td>
90ce3da70b43 Initial load duke parents: diff changeset	227	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	228	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	229	* <td nowrap valign="top" align="right"><code>property :=  </code></td>
90ce3da70b43 Initial load duke parents: diff changeset	230	* <td valign="top"><em>a Unicode property set pattern</td>
90ce3da70b43 Initial load duke parents: diff changeset	231	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	232	* </table>
90ce3da70b43 Initial load duke parents: diff changeset	233	* <br>
90ce3da70b43 Initial load duke parents: diff changeset	234	* <table border="1">
90ce3da70b43 Initial load duke parents: diff changeset	235	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	236	* <td>Legend: <table>
90ce3da70b43 Initial load duke parents: diff changeset	237	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	238	* <td nowrap valign="top"><code>a := b</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	239	* <td width="20" valign="top">  </td>
90ce3da70b43 Initial load duke parents: diff changeset	240	* <td valign="top"><code>a</code> may be replaced by <code>b</code> </td>
90ce3da70b43 Initial load duke parents: diff changeset	241	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	242	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	243	* <td nowrap valign="top"><code>a?</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	244	* <td valign="top"></td>
90ce3da70b43 Initial load duke parents: diff changeset	245	* <td valign="top">zero or one instance of <code>a</code><br>
90ce3da70b43 Initial load duke parents: diff changeset	246	* </td>
90ce3da70b43 Initial load duke parents: diff changeset	247	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	248	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	249	* <td nowrap valign="top"><code>a*</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	250	* <td valign="top"></td>
90ce3da70b43 Initial load duke parents: diff changeset	251	* <td valign="top">one or more instances of <code>a</code><br>
90ce3da70b43 Initial load duke parents: diff changeset	252	* </td>
90ce3da70b43 Initial load duke parents: diff changeset	253	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	254	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	255	* <td nowrap valign="top"><code>a \| b</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	256	* <td valign="top"></td>
90ce3da70b43 Initial load duke parents: diff changeset	257	* <td valign="top">either <code>a</code> or <code>b</code><br>
90ce3da70b43 Initial load duke parents: diff changeset	258	* </td>
90ce3da70b43 Initial load duke parents: diff changeset	259	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	260	* <tr>
90ce3da70b43 Initial load duke parents: diff changeset	261	* <td nowrap valign="top"><code>'a'</code></td>
90ce3da70b43 Initial load duke parents: diff changeset	262	* <td valign="top"></td>
90ce3da70b43 Initial load duke parents: diff changeset	263	* <td valign="top">the literal string between the quotes </td>
90ce3da70b43 Initial load duke parents: diff changeset	264	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	265	* </table>
90ce3da70b43 Initial load duke parents: diff changeset	266	* </td>
90ce3da70b43 Initial load duke parents: diff changeset	267	* </tr>
90ce3da70b43 Initial load duke parents: diff changeset	268	* </table>
90ce3da70b43 Initial load duke parents: diff changeset	269	* </blockquote>
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	270	* <p>To iterate over contents of UnicodeSet, use UnicodeSetIterator class.
2 90ce3da70b43 Initial load duke parents: diff changeset	271	*
90ce3da70b43 Initial load duke parents: diff changeset	272	* @author Alan Liu
90ce3da70b43 Initial load duke parents: diff changeset	273	* @stable ICU 2.0
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	274	* @see UnicodeSetIterator
2 90ce3da70b43 Initial load duke parents: diff changeset	275	*/
90ce3da70b43 Initial load duke parents: diff changeset	276	public class UnicodeSet implements UnicodeMatcher {
90ce3da70b43 Initial load duke parents: diff changeset	277
90ce3da70b43 Initial load duke parents: diff changeset	278	private static final int LOW = 0x000000; // LOW <= all valid values. ZERO for codepoints
90ce3da70b43 Initial load duke parents: diff changeset	279	private static final int HIGH = 0x110000; // HIGH > all valid values. 10000 for code units.
90ce3da70b43 Initial load duke parents: diff changeset	280	// 110000 for codepoints
90ce3da70b43 Initial load duke parents: diff changeset	281
90ce3da70b43 Initial load duke parents: diff changeset	282	/**
90ce3da70b43 Initial load duke parents: diff changeset	283	* Minimum value that can be stored in a UnicodeSet.
90ce3da70b43 Initial load duke parents: diff changeset	284	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	285	*/
90ce3da70b43 Initial load duke parents: diff changeset	286	public static final int MIN_VALUE = LOW;
90ce3da70b43 Initial load duke parents: diff changeset	287
90ce3da70b43 Initial load duke parents: diff changeset	288	/**
90ce3da70b43 Initial load duke parents: diff changeset	289	* Maximum value that can be stored in a UnicodeSet.
90ce3da70b43 Initial load duke parents: diff changeset	290	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	291	*/
90ce3da70b43 Initial load duke parents: diff changeset	292	public static final int MAX_VALUE = HIGH - 1;
90ce3da70b43 Initial load duke parents: diff changeset	293
90ce3da70b43 Initial load duke parents: diff changeset	294	private int len; // length used; list may be longer to minimize reallocs
90ce3da70b43 Initial load duke parents: diff changeset	295	private int[] list; // MUST be terminated with HIGH
90ce3da70b43 Initial load duke parents: diff changeset	296	private int[] rangeList; // internal buffer
90ce3da70b43 Initial load duke parents: diff changeset	297	private int[] buffer; // internal buffer
90ce3da70b43 Initial load duke parents: diff changeset	298
90ce3da70b43 Initial load duke parents: diff changeset	299	// NOTE: normally the field should be of type SortedSet; but that is missing a public clone!!
90ce3da70b43 Initial load duke parents: diff changeset	300	// is not private so that UnicodeSetIterator can get access
11136 f0f53bbe5bd1 7116914: Miscellaneous warnings (sun.text) peytoia parents: 5506 diff changeset	301	TreeSet<String> strings = new TreeSet<>();
2 90ce3da70b43 Initial load duke parents: diff changeset	302
90ce3da70b43 Initial load duke parents: diff changeset	303	/**
90ce3da70b43 Initial load duke parents: diff changeset	304	* The pattern representation of this set. This may not be the
90ce3da70b43 Initial load duke parents: diff changeset	305	* most economical pattern. It is the pattern supplied to
90ce3da70b43 Initial load duke parents: diff changeset	306	* applyPattern(), with variables substituted and whitespace
90ce3da70b43 Initial load duke parents: diff changeset	307	* removed. For sets constructed without applyPattern(), or
90ce3da70b43 Initial load duke parents: diff changeset	308	* modified using the non-pattern API, this string will be null,
90ce3da70b43 Initial load duke parents: diff changeset	309	* indicating that toPattern() must generate a pattern
90ce3da70b43 Initial load duke parents: diff changeset	310	* representation from the inversion list.
90ce3da70b43 Initial load duke parents: diff changeset	311	*/
90ce3da70b43 Initial load duke parents: diff changeset	312	private String pat = null;
90ce3da70b43 Initial load duke parents: diff changeset	313
90ce3da70b43 Initial load duke parents: diff changeset	314	private static final int START_EXTRA = 16; // initial storage. Must be >= 0
90ce3da70b43 Initial load duke parents: diff changeset	315	private static final int GROW_EXTRA = START_EXTRA; // extra amount for growth. Must be >= 0
90ce3da70b43 Initial load duke parents: diff changeset	316
90ce3da70b43 Initial load duke parents: diff changeset	317	/**
90ce3da70b43 Initial load duke parents: diff changeset	318	* A set of all characters _except_ the second through last characters of
90ce3da70b43 Initial load duke parents: diff changeset	319	* certain ranges. These ranges are ranges of characters whose
90ce3da70b43 Initial load duke parents: diff changeset	320	* properties are all exactly alike, e.g. CJK Ideographs from
90ce3da70b43 Initial load duke parents: diff changeset	321	* U+4E00 to U+9FA5.
90ce3da70b43 Initial load duke parents: diff changeset	322	*/
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	323	private static UnicodeSet INCLUSIONS[] = null;
2 90ce3da70b43 Initial load duke parents: diff changeset	324
90ce3da70b43 Initial load duke parents: diff changeset	325	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	326	// Public API
90ce3da70b43 Initial load duke parents: diff changeset	327	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	328
90ce3da70b43 Initial load duke parents: diff changeset	329	/**
90ce3da70b43 Initial load duke parents: diff changeset	330	* Constructs an empty set.
90ce3da70b43 Initial load duke parents: diff changeset	331	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	332	*/
90ce3da70b43 Initial load duke parents: diff changeset	333	public UnicodeSet() {
90ce3da70b43 Initial load duke parents: diff changeset	334	list = new int[1 + START_EXTRA];
90ce3da70b43 Initial load duke parents: diff changeset	335	list[len++] = HIGH;
90ce3da70b43 Initial load duke parents: diff changeset	336	}
90ce3da70b43 Initial load duke parents: diff changeset	337
90ce3da70b43 Initial load duke parents: diff changeset	338	/**
90ce3da70b43 Initial load duke parents: diff changeset	339	* Constructs a set containing the given range. If <code>end >
90ce3da70b43 Initial load duke parents: diff changeset	340	* start</code> then an empty set is created.
90ce3da70b43 Initial load duke parents: diff changeset	341	*
90ce3da70b43 Initial load duke parents: diff changeset	342	* @param start first character, inclusive, of range
90ce3da70b43 Initial load duke parents: diff changeset	343	* @param end last character, inclusive, of range
90ce3da70b43 Initial load duke parents: diff changeset	344	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	345	*/
90ce3da70b43 Initial load duke parents: diff changeset	346	public UnicodeSet(int start, int end) {
90ce3da70b43 Initial load duke parents: diff changeset	347	this();
90ce3da70b43 Initial load duke parents: diff changeset	348	complement(start, end);
90ce3da70b43 Initial load duke parents: diff changeset	349	}
90ce3da70b43 Initial load duke parents: diff changeset	350
90ce3da70b43 Initial load duke parents: diff changeset	351	/**
90ce3da70b43 Initial load duke parents: diff changeset	352	* Constructs a set from the given pattern. See the class description
90ce3da70b43 Initial load duke parents: diff changeset	353	* for the syntax of the pattern language. Whitespace is ignored.
90ce3da70b43 Initial load duke parents: diff changeset	354	* @param pattern a string specifying what characters are in the set
90ce3da70b43 Initial load duke parents: diff changeset	355	* @exception java.lang.IllegalArgumentException if the pattern contains
90ce3da70b43 Initial load duke parents: diff changeset	356	* a syntax error.
90ce3da70b43 Initial load duke parents: diff changeset	357	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	358	*/
90ce3da70b43 Initial load duke parents: diff changeset	359	public UnicodeSet(String pattern) {
90ce3da70b43 Initial load duke parents: diff changeset	360	this();
90ce3da70b43 Initial load duke parents: diff changeset	361	applyPattern(pattern, null, null, IGNORE_SPACE);
90ce3da70b43 Initial load duke parents: diff changeset	362	}
90ce3da70b43 Initial load duke parents: diff changeset	363
90ce3da70b43 Initial load duke parents: diff changeset	364	/**
90ce3da70b43 Initial load duke parents: diff changeset	365	* Make this object represent the same set as <code>other</code>.
90ce3da70b43 Initial load duke parents: diff changeset	366	* @param other a <code>UnicodeSet</code> whose value will be
90ce3da70b43 Initial load duke parents: diff changeset	367	* copied to this object
90ce3da70b43 Initial load duke parents: diff changeset	368	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	369	*/
90ce3da70b43 Initial load duke parents: diff changeset	370	public UnicodeSet set(UnicodeSet other) {
11136 f0f53bbe5bd1 7116914: Miscellaneous warnings (sun.text) peytoia parents: 5506 diff changeset	371	list = other.list.clone();
2 90ce3da70b43 Initial load duke parents: diff changeset	372	len = other.len;
90ce3da70b43 Initial load duke parents: diff changeset	373	pat = other.pat;
90ce3da70b43 Initial load duke parents: diff changeset	374	strings = (TreeSet)other.strings.clone();
90ce3da70b43 Initial load duke parents: diff changeset	375	return this;
90ce3da70b43 Initial load duke parents: diff changeset	376	}
90ce3da70b43 Initial load duke parents: diff changeset	377
90ce3da70b43 Initial load duke parents: diff changeset	378	/**
90ce3da70b43 Initial load duke parents: diff changeset	379	* Modifies this set to represent the set specified by the given pattern.
90ce3da70b43 Initial load duke parents: diff changeset	380	* See the class description for the syntax of the pattern language.
90ce3da70b43 Initial load duke parents: diff changeset	381	* Whitespace is ignored.
90ce3da70b43 Initial load duke parents: diff changeset	382	* @param pattern a string specifying what characters are in the set
90ce3da70b43 Initial load duke parents: diff changeset	383	* @exception java.lang.IllegalArgumentException if the pattern
90ce3da70b43 Initial load duke parents: diff changeset	384	* contains a syntax error.
90ce3da70b43 Initial load duke parents: diff changeset	385	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	386	*/
90ce3da70b43 Initial load duke parents: diff changeset	387	public final UnicodeSet applyPattern(String pattern) {
90ce3da70b43 Initial load duke parents: diff changeset	388	return applyPattern(pattern, null, null, IGNORE_SPACE);
90ce3da70b43 Initial load duke parents: diff changeset	389	}
90ce3da70b43 Initial load duke parents: diff changeset	390
90ce3da70b43 Initial load duke parents: diff changeset	391	/**
90ce3da70b43 Initial load duke parents: diff changeset	392	* Append the <code>toPattern()</code> representation of a
90ce3da70b43 Initial load duke parents: diff changeset	393	* string to the given <code>StringBuffer</code>.
90ce3da70b43 Initial load duke parents: diff changeset	394	*/
90ce3da70b43 Initial load duke parents: diff changeset	395	private static void _appendToPat(StringBuffer buf, String s, boolean escapeUnprintable) {
90ce3da70b43 Initial load duke parents: diff changeset	396	for (int i = 0; i < s.length(); i += UTF16.getCharCount(i)) {
90ce3da70b43 Initial load duke parents: diff changeset	397	_appendToPat(buf, UTF16.charAt(s, i), escapeUnprintable);
90ce3da70b43 Initial load duke parents: diff changeset	398	}
90ce3da70b43 Initial load duke parents: diff changeset	399	}
90ce3da70b43 Initial load duke parents: diff changeset	400
90ce3da70b43 Initial load duke parents: diff changeset	401	/**
90ce3da70b43 Initial load duke parents: diff changeset	402	* Append the <code>toPattern()</code> representation of a
90ce3da70b43 Initial load duke parents: diff changeset	403	* character to the given <code>StringBuffer</code>.
90ce3da70b43 Initial load duke parents: diff changeset	404	*/
90ce3da70b43 Initial load duke parents: diff changeset	405	private static void _appendToPat(StringBuffer buf, int c, boolean escapeUnprintable) {
90ce3da70b43 Initial load duke parents: diff changeset	406	if (escapeUnprintable && Utility.isUnprintable(c)) {
90ce3da70b43 Initial load duke parents: diff changeset	407	// Use hex escape notation (<backslash>uxxxx or <backslash>Uxxxxxxxx) for anything
90ce3da70b43 Initial load duke parents: diff changeset	408	// unprintable
90ce3da70b43 Initial load duke parents: diff changeset	409	if (Utility.escapeUnprintable(buf, c)) {
90ce3da70b43 Initial load duke parents: diff changeset	410	return;
90ce3da70b43 Initial load duke parents: diff changeset	411	}
90ce3da70b43 Initial load duke parents: diff changeset	412	}
90ce3da70b43 Initial load duke parents: diff changeset	413	// Okay to let ':' pass through
90ce3da70b43 Initial load duke parents: diff changeset	414	switch (c) {
90ce3da70b43 Initial load duke parents: diff changeset	415	case '[': // SET_OPEN:
90ce3da70b43 Initial load duke parents: diff changeset	416	case ']': // SET_CLOSE:
90ce3da70b43 Initial load duke parents: diff changeset	417	case '-': // HYPHEN:
90ce3da70b43 Initial load duke parents: diff changeset	418	case '^': // COMPLEMENT:
90ce3da70b43 Initial load duke parents: diff changeset	419	case '&': // INTERSECTION:
90ce3da70b43 Initial load duke parents: diff changeset	420	case '\\': //BACKSLASH:
90ce3da70b43 Initial load duke parents: diff changeset	421	case '{':
90ce3da70b43 Initial load duke parents: diff changeset	422	case '}':
90ce3da70b43 Initial load duke parents: diff changeset	423	case '$':
90ce3da70b43 Initial load duke parents: diff changeset	424	case ':':
90ce3da70b43 Initial load duke parents: diff changeset	425	buf.append('\\');
90ce3da70b43 Initial load duke parents: diff changeset	426	break;
90ce3da70b43 Initial load duke parents: diff changeset	427	default:
90ce3da70b43 Initial load duke parents: diff changeset	428	// Escape whitespace
90ce3da70b43 Initial load duke parents: diff changeset	429	if (UCharacterProperty.isRuleWhiteSpace(c)) {
90ce3da70b43 Initial load duke parents: diff changeset	430	buf.append('\\');
90ce3da70b43 Initial load duke parents: diff changeset	431	}
90ce3da70b43 Initial load duke parents: diff changeset	432	break;
90ce3da70b43 Initial load duke parents: diff changeset	433	}
90ce3da70b43 Initial load duke parents: diff changeset	434	UTF16.append(buf, c);
90ce3da70b43 Initial load duke parents: diff changeset	435	}
90ce3da70b43 Initial load duke parents: diff changeset	436
90ce3da70b43 Initial load duke parents: diff changeset	437	/**
90ce3da70b43 Initial load duke parents: diff changeset	438	* Append a string representation of this set to result. This will be
90ce3da70b43 Initial load duke parents: diff changeset	439	* a cleaned version of the string passed to applyPattern(), if there
90ce3da70b43 Initial load duke parents: diff changeset	440	* is one. Otherwise it will be generated.
90ce3da70b43 Initial load duke parents: diff changeset	441	*/
90ce3da70b43 Initial load duke parents: diff changeset	442	private StringBuffer _toPattern(StringBuffer result,
90ce3da70b43 Initial load duke parents: diff changeset	443	boolean escapeUnprintable) {
90ce3da70b43 Initial load duke parents: diff changeset	444	if (pat != null) {
90ce3da70b43 Initial load duke parents: diff changeset	445	int i;
90ce3da70b43 Initial load duke parents: diff changeset	446	int backslashCount = 0;
90ce3da70b43 Initial load duke parents: diff changeset	447	for (i=0; i<pat.length(); ) {
90ce3da70b43 Initial load duke parents: diff changeset	448	int c = UTF16.charAt(pat, i);
90ce3da70b43 Initial load duke parents: diff changeset	449	i += UTF16.getCharCount(c);
90ce3da70b43 Initial load duke parents: diff changeset	450	if (escapeUnprintable && Utility.isUnprintable(c)) {
90ce3da70b43 Initial load duke parents: diff changeset	451	// If the unprintable character is preceded by an odd
90ce3da70b43 Initial load duke parents: diff changeset	452	// number of backslashes, then it has been escaped.
90ce3da70b43 Initial load duke parents: diff changeset	453	// Before unescaping it, we delete the final
90ce3da70b43 Initial load duke parents: diff changeset	454	// backslash.
90ce3da70b43 Initial load duke parents: diff changeset	455	if ((backslashCount % 2) == 1) {
90ce3da70b43 Initial load duke parents: diff changeset	456	result.setLength(result.length() - 1);
90ce3da70b43 Initial load duke parents: diff changeset	457	}
90ce3da70b43 Initial load duke parents: diff changeset	458	Utility.escapeUnprintable(result, c);
90ce3da70b43 Initial load duke parents: diff changeset	459	backslashCount = 0;
90ce3da70b43 Initial load duke parents: diff changeset	460	} else {
90ce3da70b43 Initial load duke parents: diff changeset	461	UTF16.append(result, c);
90ce3da70b43 Initial load duke parents: diff changeset	462	if (c == '\\') {
90ce3da70b43 Initial load duke parents: diff changeset	463	++backslashCount;
90ce3da70b43 Initial load duke parents: diff changeset	464	} else {
90ce3da70b43 Initial load duke parents: diff changeset	465	backslashCount = 0;
90ce3da70b43 Initial load duke parents: diff changeset	466	}
90ce3da70b43 Initial load duke parents: diff changeset	467	}
90ce3da70b43 Initial load duke parents: diff changeset	468	}
90ce3da70b43 Initial load duke parents: diff changeset	469	return result;
90ce3da70b43 Initial load duke parents: diff changeset	470	}
90ce3da70b43 Initial load duke parents: diff changeset	471
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	472	return _generatePattern(result, escapeUnprintable, true);
2 90ce3da70b43 Initial load duke parents: diff changeset	473	}
90ce3da70b43 Initial load duke parents: diff changeset	474
90ce3da70b43 Initial load duke parents: diff changeset	475	/**
90ce3da70b43 Initial load duke parents: diff changeset	476	* Generate and append a string representation of this set to result.
90ce3da70b43 Initial load duke parents: diff changeset	477	* This does not use this.pat, the cleaned up copy of the string
90ce3da70b43 Initial load duke parents: diff changeset	478	* passed to applyPattern().
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	479	* @param includeStrings if false, doesn't include the strings.
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	480	* @stable ICU 3.8
2 90ce3da70b43 Initial load duke parents: diff changeset	481	*/
90ce3da70b43 Initial load duke parents: diff changeset	482	public StringBuffer _generatePattern(StringBuffer result,
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	483	boolean escapeUnprintable, boolean includeStrings) {
2 90ce3da70b43 Initial load duke parents: diff changeset	484	result.append('[');
90ce3da70b43 Initial load duke parents: diff changeset	485
90ce3da70b43 Initial load duke parents: diff changeset	486	int count = getRangeCount();
90ce3da70b43 Initial load duke parents: diff changeset	487
90ce3da70b43 Initial load duke parents: diff changeset	488	// If the set contains at least 2 intervals and includes both
90ce3da70b43 Initial load duke parents: diff changeset	489	// MIN_VALUE and MAX_VALUE, then the inverse representation will
90ce3da70b43 Initial load duke parents: diff changeset	490	// be more economical.
90ce3da70b43 Initial load duke parents: diff changeset	491	if (count > 1 &&
90ce3da70b43 Initial load duke parents: diff changeset	492	getRangeStart(0) == MIN_VALUE &&
90ce3da70b43 Initial load duke parents: diff changeset	493	getRangeEnd(count-1) == MAX_VALUE) {
90ce3da70b43 Initial load duke parents: diff changeset	494
90ce3da70b43 Initial load duke parents: diff changeset	495	// Emit the inverse
90ce3da70b43 Initial load duke parents: diff changeset	496	result.append('^');
90ce3da70b43 Initial load duke parents: diff changeset	497
90ce3da70b43 Initial load duke parents: diff changeset	498	for (int i = 1; i < count; ++i) {
90ce3da70b43 Initial load duke parents: diff changeset	499	int start = getRangeEnd(i-1)+1;
90ce3da70b43 Initial load duke parents: diff changeset	500	int end = getRangeStart(i)-1;
90ce3da70b43 Initial load duke parents: diff changeset	501	_appendToPat(result, start, escapeUnprintable);
90ce3da70b43 Initial load duke parents: diff changeset	502	if (start != end) {
90ce3da70b43 Initial load duke parents: diff changeset	503	if ((start+1) != end) {
90ce3da70b43 Initial load duke parents: diff changeset	504	result.append('-');
90ce3da70b43 Initial load duke parents: diff changeset	505	}
90ce3da70b43 Initial load duke parents: diff changeset	506	_appendToPat(result, end, escapeUnprintable);
90ce3da70b43 Initial load duke parents: diff changeset	507	}
90ce3da70b43 Initial load duke parents: diff changeset	508	}
90ce3da70b43 Initial load duke parents: diff changeset	509	}
90ce3da70b43 Initial load duke parents: diff changeset	510
90ce3da70b43 Initial load duke parents: diff changeset	511	// Default; emit the ranges as pairs
90ce3da70b43 Initial load duke parents: diff changeset	512	else {
90ce3da70b43 Initial load duke parents: diff changeset	513	for (int i = 0; i < count; ++i) {
90ce3da70b43 Initial load duke parents: diff changeset	514	int start = getRangeStart(i);
90ce3da70b43 Initial load duke parents: diff changeset	515	int end = getRangeEnd(i);
90ce3da70b43 Initial load duke parents: diff changeset	516	_appendToPat(result, start, escapeUnprintable);
90ce3da70b43 Initial load duke parents: diff changeset	517	if (start != end) {
90ce3da70b43 Initial load duke parents: diff changeset	518	if ((start+1) != end) {
90ce3da70b43 Initial load duke parents: diff changeset	519	result.append('-');
90ce3da70b43 Initial load duke parents: diff changeset	520	}
90ce3da70b43 Initial load duke parents: diff changeset	521	_appendToPat(result, end, escapeUnprintable);
90ce3da70b43 Initial load duke parents: diff changeset	522	}
90ce3da70b43 Initial load duke parents: diff changeset	523	}
90ce3da70b43 Initial load duke parents: diff changeset	524	}
90ce3da70b43 Initial load duke parents: diff changeset	525
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	526	if (includeStrings && strings.size() > 0) {
11136 f0f53bbe5bd1 7116914: Miscellaneous warnings (sun.text) peytoia parents: 5506 diff changeset	527	Iterator<String> it = strings.iterator();
2 90ce3da70b43 Initial load duke parents: diff changeset	528	while (it.hasNext()) {
90ce3da70b43 Initial load duke parents: diff changeset	529	result.append('{');
11136 f0f53bbe5bd1 7116914: Miscellaneous warnings (sun.text) peytoia parents: 5506 diff changeset	530	_appendToPat(result, it.next(), escapeUnprintable);
2 90ce3da70b43 Initial load duke parents: diff changeset	531	result.append('}');
90ce3da70b43 Initial load duke parents: diff changeset	532	}
90ce3da70b43 Initial load duke parents: diff changeset	533	}
90ce3da70b43 Initial load duke parents: diff changeset	534	return result.append(']');
90ce3da70b43 Initial load duke parents: diff changeset	535	}
90ce3da70b43 Initial load duke parents: diff changeset	536
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	537	// for internal use, after checkFrozen has been called
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	538	private UnicodeSet add_unchecked(int start, int end) {
2 90ce3da70b43 Initial load duke parents: diff changeset	539	if (start < MIN_VALUE \|\| start > MAX_VALUE) {
90ce3da70b43 Initial load duke parents: diff changeset	540	throw new IllegalArgumentException("Invalid code point U+" + Utility.hex(start, 6));
90ce3da70b43 Initial load duke parents: diff changeset	541	}
90ce3da70b43 Initial load duke parents: diff changeset	542	if (end < MIN_VALUE \|\| end > MAX_VALUE) {
90ce3da70b43 Initial load duke parents: diff changeset	543	throw new IllegalArgumentException("Invalid code point U+" + Utility.hex(end, 6));
90ce3da70b43 Initial load duke parents: diff changeset	544	}
90ce3da70b43 Initial load duke parents: diff changeset	545	if (start < end) {
90ce3da70b43 Initial load duke parents: diff changeset	546	add(range(start, end), 2, 0);
90ce3da70b43 Initial load duke parents: diff changeset	547	} else if (start == end) {
90ce3da70b43 Initial load duke parents: diff changeset	548	add(start);
90ce3da70b43 Initial load duke parents: diff changeset	549	}
90ce3da70b43 Initial load duke parents: diff changeset	550	return this;
90ce3da70b43 Initial load duke parents: diff changeset	551	}
90ce3da70b43 Initial load duke parents: diff changeset	552
90ce3da70b43 Initial load duke parents: diff changeset	553	/**
90ce3da70b43 Initial load duke parents: diff changeset	554	* Adds the specified character to this set if it is not already
90ce3da70b43 Initial load duke parents: diff changeset	555	* present. If this set already contains the specified character,
90ce3da70b43 Initial load duke parents: diff changeset	556	* the call leaves this set unchanged.
90ce3da70b43 Initial load duke parents: diff changeset	557	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	558	*/
90ce3da70b43 Initial load duke parents: diff changeset	559	public final UnicodeSet add(int c) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	560	return add_unchecked(c);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	561	}
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	562
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	563	// for internal use only, after checkFrozen has been called
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	564	private final UnicodeSet add_unchecked(int c) {
2 90ce3da70b43 Initial load duke parents: diff changeset	565	if (c < MIN_VALUE \|\| c > MAX_VALUE) {
90ce3da70b43 Initial load duke parents: diff changeset	566	throw new IllegalArgumentException("Invalid code point U+" + Utility.hex(c, 6));
90ce3da70b43 Initial load duke parents: diff changeset	567	}
90ce3da70b43 Initial load duke parents: diff changeset	568
90ce3da70b43 Initial load duke parents: diff changeset	569	// find smallest i such that c < list[i]
90ce3da70b43 Initial load duke parents: diff changeset	570	// if odd, then it is IN the set
90ce3da70b43 Initial load duke parents: diff changeset	571	// if even, then it is OUT of the set
90ce3da70b43 Initial load duke parents: diff changeset	572	int i = findCodePoint(c);
90ce3da70b43 Initial load duke parents: diff changeset	573
90ce3da70b43 Initial load duke parents: diff changeset	574	// already in set?
90ce3da70b43 Initial load duke parents: diff changeset	575	if ((i & 1) != 0) return this;
90ce3da70b43 Initial load duke parents: diff changeset	576
90ce3da70b43 Initial load duke parents: diff changeset	577	// HIGH is 0x110000
90ce3da70b43 Initial load duke parents: diff changeset	578	// assert(list[len-1] == HIGH);
90ce3da70b43 Initial load duke parents: diff changeset	579
90ce3da70b43 Initial load duke parents: diff changeset	580	// empty = [HIGH]
90ce3da70b43 Initial load duke parents: diff changeset	581	// [start_0, limit_0, start_1, limit_1, HIGH]
90ce3da70b43 Initial load duke parents: diff changeset	582
90ce3da70b43 Initial load duke parents: diff changeset	583	// [..., start_k-1, limit_k-1, start_k, limit_k, ..., HIGH]
90ce3da70b43 Initial load duke parents: diff changeset	584	// ^
90ce3da70b43 Initial load duke parents: diff changeset	585	// list[i]
90ce3da70b43 Initial load duke parents: diff changeset	586
90ce3da70b43 Initial load duke parents: diff changeset	587	// i == 0 means c is before the first range
90ce3da70b43 Initial load duke parents: diff changeset	588
90ce3da70b43 Initial load duke parents: diff changeset	589	if (c == list[i]-1) {
90ce3da70b43 Initial load duke parents: diff changeset	590	// c is before start of next range
90ce3da70b43 Initial load duke parents: diff changeset	591	list[i] = c;
90ce3da70b43 Initial load duke parents: diff changeset	592	// if we touched the HIGH mark, then add a new one
90ce3da70b43 Initial load duke parents: diff changeset	593	if (c == MAX_VALUE) {
90ce3da70b43 Initial load duke parents: diff changeset	594	ensureCapacity(len+1);
90ce3da70b43 Initial load duke parents: diff changeset	595	list[len++] = HIGH;
90ce3da70b43 Initial load duke parents: diff changeset	596	}
90ce3da70b43 Initial load duke parents: diff changeset	597	if (i > 0 && c == list[i-1]) {
90ce3da70b43 Initial load duke parents: diff changeset	598	// collapse adjacent ranges
90ce3da70b43 Initial load duke parents: diff changeset	599
90ce3da70b43 Initial load duke parents: diff changeset	600	// [..., start_k-1, c, c, limit_k, ..., HIGH]
90ce3da70b43 Initial load duke parents: diff changeset	601	// ^
90ce3da70b43 Initial load duke parents: diff changeset	602	// list[i]
90ce3da70b43 Initial load duke parents: diff changeset	603	System.arraycopy(list, i+1, list, i-1, len-i-1);
90ce3da70b43 Initial load duke parents: diff changeset	604	len -= 2;
90ce3da70b43 Initial load duke parents: diff changeset	605	}
90ce3da70b43 Initial load duke parents: diff changeset	606	}
90ce3da70b43 Initial load duke parents: diff changeset	607
90ce3da70b43 Initial load duke parents: diff changeset	608	else if (i > 0 && c == list[i-1]) {
90ce3da70b43 Initial load duke parents: diff changeset	609	// c is after end of prior range
90ce3da70b43 Initial load duke parents: diff changeset	610	list[i-1]++;
90ce3da70b43 Initial load duke parents: diff changeset	611	// no need to chcek for collapse here
90ce3da70b43 Initial load duke parents: diff changeset	612	}
90ce3da70b43 Initial load duke parents: diff changeset	613
90ce3da70b43 Initial load duke parents: diff changeset	614	else {
90ce3da70b43 Initial load duke parents: diff changeset	615	// At this point we know the new char is not adjacent to
90ce3da70b43 Initial load duke parents: diff changeset	616	// any existing ranges, and it is not 10FFFF.
90ce3da70b43 Initial load duke parents: diff changeset	617
90ce3da70b43 Initial load duke parents: diff changeset	618
90ce3da70b43 Initial load duke parents: diff changeset	619	// [..., start_k-1, limit_k-1, start_k, limit_k, ..., HIGH]
90ce3da70b43 Initial load duke parents: diff changeset	620	// ^
90ce3da70b43 Initial load duke parents: diff changeset	621	// list[i]
90ce3da70b43 Initial load duke parents: diff changeset	622
90ce3da70b43 Initial load duke parents: diff changeset	623	// [..., start_k-1, limit_k-1, c, c+1, start_k, limit_k, ..., HIGH]
90ce3da70b43 Initial load duke parents: diff changeset	624	// ^
90ce3da70b43 Initial load duke parents: diff changeset	625	// list[i]
90ce3da70b43 Initial load duke parents: diff changeset	626
90ce3da70b43 Initial load duke parents: diff changeset	627	// Don't use ensureCapacity() to save on copying.
90ce3da70b43 Initial load duke parents: diff changeset	628	// NOTE: This has no measurable impact on performance,
90ce3da70b43 Initial load duke parents: diff changeset	629	// but it might help in some usage patterns.
90ce3da70b43 Initial load duke parents: diff changeset	630	if (len+2 > list.length) {
90ce3da70b43 Initial load duke parents: diff changeset	631	int[] temp = new int[len + 2 + GROW_EXTRA];
90ce3da70b43 Initial load duke parents: diff changeset	632	if (i != 0) System.arraycopy(list, 0, temp, 0, i);
90ce3da70b43 Initial load duke parents: diff changeset	633	System.arraycopy(list, i, temp, i+2, len-i);
90ce3da70b43 Initial load duke parents: diff changeset	634	list = temp;
90ce3da70b43 Initial load duke parents: diff changeset	635	} else {
90ce3da70b43 Initial load duke parents: diff changeset	636	System.arraycopy(list, i, list, i+2, len-i);
90ce3da70b43 Initial load duke parents: diff changeset	637	}
90ce3da70b43 Initial load duke parents: diff changeset	638
90ce3da70b43 Initial load duke parents: diff changeset	639	list[i] = c;
90ce3da70b43 Initial load duke parents: diff changeset	640	list[i+1] = c+1;
90ce3da70b43 Initial load duke parents: diff changeset	641	len += 2;
90ce3da70b43 Initial load duke parents: diff changeset	642	}
90ce3da70b43 Initial load duke parents: diff changeset	643
90ce3da70b43 Initial load duke parents: diff changeset	644	pat = null;
90ce3da70b43 Initial load duke parents: diff changeset	645	return this;
90ce3da70b43 Initial load duke parents: diff changeset	646	}
90ce3da70b43 Initial load duke parents: diff changeset	647
90ce3da70b43 Initial load duke parents: diff changeset	648	/**
90ce3da70b43 Initial load duke parents: diff changeset	649	* Adds the specified multicharacter to this set if it is not already
90ce3da70b43 Initial load duke parents: diff changeset	650	* present. If this set already contains the multicharacter,
90ce3da70b43 Initial load duke parents: diff changeset	651	* the call leaves this set unchanged.
90ce3da70b43 Initial load duke parents: diff changeset	652	* Thus "ch" => {"ch"}
90ce3da70b43 Initial load duke parents: diff changeset	653	* <br><b>Warning: you cannot add an empty string ("") to a UnicodeSet.</b>
90ce3da70b43 Initial load duke parents: diff changeset	654	* @param s the source string
90ce3da70b43 Initial load duke parents: diff changeset	655	* @return this object, for chaining
90ce3da70b43 Initial load duke parents: diff changeset	656	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	657	*/
90ce3da70b43 Initial load duke parents: diff changeset	658	public final UnicodeSet add(String s) {
90ce3da70b43 Initial load duke parents: diff changeset	659	int cp = getSingleCP(s);
90ce3da70b43 Initial load duke parents: diff changeset	660	if (cp < 0) {
90ce3da70b43 Initial load duke parents: diff changeset	661	strings.add(s);
90ce3da70b43 Initial load duke parents: diff changeset	662	pat = null;
90ce3da70b43 Initial load duke parents: diff changeset	663	} else {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	664	add_unchecked(cp, cp);
2 90ce3da70b43 Initial load duke parents: diff changeset	665	}
90ce3da70b43 Initial load duke parents: diff changeset	666	return this;
90ce3da70b43 Initial load duke parents: diff changeset	667	}
90ce3da70b43 Initial load duke parents: diff changeset	668
90ce3da70b43 Initial load duke parents: diff changeset	669	/**
90ce3da70b43 Initial load duke parents: diff changeset	670	* @return a code point IF the string consists of a single one.
90ce3da70b43 Initial load duke parents: diff changeset	671	* otherwise returns -1.
90ce3da70b43 Initial load duke parents: diff changeset	672	* @param string to test
90ce3da70b43 Initial load duke parents: diff changeset	673	*/
90ce3da70b43 Initial load duke parents: diff changeset	674	private static int getSingleCP(String s) {
90ce3da70b43 Initial load duke parents: diff changeset	675	if (s.length() < 1) {
90ce3da70b43 Initial load duke parents: diff changeset	676	throw new IllegalArgumentException("Can't use zero-length strings in UnicodeSet");
90ce3da70b43 Initial load duke parents: diff changeset	677	}
90ce3da70b43 Initial load duke parents: diff changeset	678	if (s.length() > 2) return -1;
90ce3da70b43 Initial load duke parents: diff changeset	679	if (s.length() == 1) return s.charAt(0);
90ce3da70b43 Initial load duke parents: diff changeset	680
90ce3da70b43 Initial load duke parents: diff changeset	681	// at this point, len = 2
90ce3da70b43 Initial load duke parents: diff changeset	682	int cp = UTF16.charAt(s, 0);
90ce3da70b43 Initial load duke parents: diff changeset	683	if (cp > 0xFFFF) { // is surrogate pair
90ce3da70b43 Initial load duke parents: diff changeset	684	return cp;
90ce3da70b43 Initial load duke parents: diff changeset	685	}
90ce3da70b43 Initial load duke parents: diff changeset	686	return -1;
90ce3da70b43 Initial load duke parents: diff changeset	687	}
90ce3da70b43 Initial load duke parents: diff changeset	688
90ce3da70b43 Initial load duke parents: diff changeset	689	/**
90ce3da70b43 Initial load duke parents: diff changeset	690	* Complements the specified range in this set. Any character in
90ce3da70b43 Initial load duke parents: diff changeset	691	* the range will be removed if it is in this set, or will be
90ce3da70b43 Initial load duke parents: diff changeset	692	* added if it is not in this set. If <code>end > start</code>
90ce3da70b43 Initial load duke parents: diff changeset	693	* then an empty range is complemented, leaving the set unchanged.
90ce3da70b43 Initial load duke parents: diff changeset	694	*
90ce3da70b43 Initial load duke parents: diff changeset	695	* @param start first character, inclusive, of range to be removed
90ce3da70b43 Initial load duke parents: diff changeset	696	* from this set.
90ce3da70b43 Initial load duke parents: diff changeset	697	* @param end last character, inclusive, of range to be removed
90ce3da70b43 Initial load duke parents: diff changeset	698	* from this set.
90ce3da70b43 Initial load duke parents: diff changeset	699	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	700	*/
90ce3da70b43 Initial load duke parents: diff changeset	701	public UnicodeSet complement(int start, int end) {
90ce3da70b43 Initial load duke parents: diff changeset	702	if (start < MIN_VALUE \|\| start > MAX_VALUE) {
90ce3da70b43 Initial load duke parents: diff changeset	703	throw new IllegalArgumentException("Invalid code point U+" + Utility.hex(start, 6));
90ce3da70b43 Initial load duke parents: diff changeset	704	}
90ce3da70b43 Initial load duke parents: diff changeset	705	if (end < MIN_VALUE \|\| end > MAX_VALUE) {
90ce3da70b43 Initial load duke parents: diff changeset	706	throw new IllegalArgumentException("Invalid code point U+" + Utility.hex(end, 6));
90ce3da70b43 Initial load duke parents: diff changeset	707	}
90ce3da70b43 Initial load duke parents: diff changeset	708	if (start <= end) {
90ce3da70b43 Initial load duke parents: diff changeset	709	xor(range(start, end), 2, 0);
90ce3da70b43 Initial load duke parents: diff changeset	710	}
90ce3da70b43 Initial load duke parents: diff changeset	711	pat = null;
90ce3da70b43 Initial load duke parents: diff changeset	712	return this;
90ce3da70b43 Initial load duke parents: diff changeset	713	}
90ce3da70b43 Initial load duke parents: diff changeset	714
90ce3da70b43 Initial load duke parents: diff changeset	715	/**
90ce3da70b43 Initial load duke parents: diff changeset	716	* This is equivalent to
90ce3da70b43 Initial load duke parents: diff changeset	717	* <code>complement(MIN_VALUE, MAX_VALUE)</code>.
90ce3da70b43 Initial load duke parents: diff changeset	718	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	719	*/
90ce3da70b43 Initial load duke parents: diff changeset	720	public UnicodeSet complement() {
90ce3da70b43 Initial load duke parents: diff changeset	721	if (list[0] == LOW) {
90ce3da70b43 Initial load duke parents: diff changeset	722	System.arraycopy(list, 1, list, 0, len-1);
90ce3da70b43 Initial load duke parents: diff changeset	723	--len;
90ce3da70b43 Initial load duke parents: diff changeset	724	} else {
90ce3da70b43 Initial load duke parents: diff changeset	725	ensureCapacity(len+1);
90ce3da70b43 Initial load duke parents: diff changeset	726	System.arraycopy(list, 0, list, 1, len);
90ce3da70b43 Initial load duke parents: diff changeset	727	list[0] = LOW;
90ce3da70b43 Initial load duke parents: diff changeset	728	++len;
90ce3da70b43 Initial load duke parents: diff changeset	729	}
90ce3da70b43 Initial load duke parents: diff changeset	730	pat = null;
90ce3da70b43 Initial load duke parents: diff changeset	731	return this;
90ce3da70b43 Initial load duke parents: diff changeset	732	}
90ce3da70b43 Initial load duke parents: diff changeset	733
90ce3da70b43 Initial load duke parents: diff changeset	734	/**
90ce3da70b43 Initial load duke parents: diff changeset	735	* Returns true if this set contains the given character.
90ce3da70b43 Initial load duke parents: diff changeset	736	* @param c character to be checked for containment
90ce3da70b43 Initial load duke parents: diff changeset	737	* @return true if the test condition is met
90ce3da70b43 Initial load duke parents: diff changeset	738	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	739	*/
90ce3da70b43 Initial load duke parents: diff changeset	740	public boolean contains(int c) {
90ce3da70b43 Initial load duke parents: diff changeset	741	if (c < MIN_VALUE \|\| c > MAX_VALUE) {
90ce3da70b43 Initial load duke parents: diff changeset	742	throw new IllegalArgumentException("Invalid code point U+" + Utility.hex(c, 6));
90ce3da70b43 Initial load duke parents: diff changeset	743	}
90ce3da70b43 Initial load duke parents: diff changeset	744
90ce3da70b43 Initial load duke parents: diff changeset	745	/*
90ce3da70b43 Initial load duke parents: diff changeset	746	// Set i to the index of the start item greater than ch
90ce3da70b43 Initial load duke parents: diff changeset	747	// We know we will terminate without length test!
90ce3da70b43 Initial load duke parents: diff changeset	748	int i = -1;
90ce3da70b43 Initial load duke parents: diff changeset	749	while (true) {
90ce3da70b43 Initial load duke parents: diff changeset	750	if (c < list[++i]) break;
90ce3da70b43 Initial load duke parents: diff changeset	751	}
90ce3da70b43 Initial load duke parents: diff changeset	752	*/
90ce3da70b43 Initial load duke parents: diff changeset	753
90ce3da70b43 Initial load duke parents: diff changeset	754	int i = findCodePoint(c);
90ce3da70b43 Initial load duke parents: diff changeset	755
90ce3da70b43 Initial load duke parents: diff changeset	756	return ((i & 1) != 0); // return true if odd
90ce3da70b43 Initial load duke parents: diff changeset	757	}
90ce3da70b43 Initial load duke parents: diff changeset	758
90ce3da70b43 Initial load duke parents: diff changeset	759	/**
90ce3da70b43 Initial load duke parents: diff changeset	760	* Returns the smallest value i such that c < list[i]. Caller
90ce3da70b43 Initial load duke parents: diff changeset	761	* must ensure that c is a legal value or this method will enter
90ce3da70b43 Initial load duke parents: diff changeset	762	* an infinite loop. This method performs a binary search.
90ce3da70b43 Initial load duke parents: diff changeset	763	* @param c a character in the range MIN_VALUE..MAX_VALUE
90ce3da70b43 Initial load duke parents: diff changeset	764	* inclusive
90ce3da70b43 Initial load duke parents: diff changeset	765	* @return the smallest integer i in the range 0..len-1,
90ce3da70b43 Initial load duke parents: diff changeset	766	* inclusive, such that c < list[i]
90ce3da70b43 Initial load duke parents: diff changeset	767	*/
90ce3da70b43 Initial load duke parents: diff changeset	768	private final int findCodePoint(int c) {
90ce3da70b43 Initial load duke parents: diff changeset	769	/* Examples:
90ce3da70b43 Initial load duke parents: diff changeset	770	findCodePoint(c)
90ce3da70b43 Initial load duke parents: diff changeset	771	set list[] c=0 1 3 4 7 8
90ce3da70b43 Initial load duke parents: diff changeset	772	=== ============== ===========
90ce3da70b43 Initial load duke parents: diff changeset	773	[] [110000] 0 0 0 0 0 0
90ce3da70b43 Initial load duke parents: diff changeset	774	[\u0000-\u0003] [0, 4, 110000] 1 1 1 2 2 2
90ce3da70b43 Initial load duke parents: diff changeset	775	[\u0004-\u0007] [4, 8, 110000] 0 0 0 1 1 2
90ce3da70b43 Initial load duke parents: diff changeset	776	[:all:] [0, 110000] 1 1 1 1 1 1
90ce3da70b43 Initial load duke parents: diff changeset	777	*/
90ce3da70b43 Initial load duke parents: diff changeset	778
90ce3da70b43 Initial load duke parents: diff changeset	779	// Return the smallest i such that c < list[i]. Assume
90ce3da70b43 Initial load duke parents: diff changeset	780	// list[len - 1] == HIGH and that c is legal (0..HIGH-1).
90ce3da70b43 Initial load duke parents: diff changeset	781	if (c < list[0]) return 0;
90ce3da70b43 Initial load duke parents: diff changeset	782	// High runner test. c is often after the last range, so an
90ce3da70b43 Initial load duke parents: diff changeset	783	// initial check for this condition pays off.
90ce3da70b43 Initial load duke parents: diff changeset	784	if (len >= 2 && c >= list[len-2]) return len-1;
90ce3da70b43 Initial load duke parents: diff changeset	785	int lo = 0;
90ce3da70b43 Initial load duke parents: diff changeset	786	int hi = len - 1;
90ce3da70b43 Initial load duke parents: diff changeset	787	// invariant: c >= list[lo]
90ce3da70b43 Initial load duke parents: diff changeset	788	// invariant: c < list[hi]
90ce3da70b43 Initial load duke parents: diff changeset	789	for (;;) {
90ce3da70b43 Initial load duke parents: diff changeset	790	int i = (lo + hi) >>> 1;
90ce3da70b43 Initial load duke parents: diff changeset	791	if (i == lo) return hi;
90ce3da70b43 Initial load duke parents: diff changeset	792	if (c < list[i]) {
90ce3da70b43 Initial load duke parents: diff changeset	793	hi = i;
90ce3da70b43 Initial load duke parents: diff changeset	794	} else {
90ce3da70b43 Initial load duke parents: diff changeset	795	lo = i;
90ce3da70b43 Initial load duke parents: diff changeset	796	}
90ce3da70b43 Initial load duke parents: diff changeset	797	}
90ce3da70b43 Initial load duke parents: diff changeset	798	}
90ce3da70b43 Initial load duke parents: diff changeset	799
90ce3da70b43 Initial load duke parents: diff changeset	800	/**
90ce3da70b43 Initial load duke parents: diff changeset	801	* Adds all of the elements in the specified set to this set if
90ce3da70b43 Initial load duke parents: diff changeset	802	* they're not already present. This operation effectively
90ce3da70b43 Initial load duke parents: diff changeset	803	* modifies this set so that its value is the <i>union</i> of the two
90ce3da70b43 Initial load duke parents: diff changeset	804	* sets. The behavior of this operation is unspecified if the specified
90ce3da70b43 Initial load duke parents: diff changeset	805	* collection is modified while the operation is in progress.
90ce3da70b43 Initial load duke parents: diff changeset	806	*
90ce3da70b43 Initial load duke parents: diff changeset	807	* @param c set whose elements are to be added to this set.
90ce3da70b43 Initial load duke parents: diff changeset	808	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	809	*/
90ce3da70b43 Initial load duke parents: diff changeset	810	public UnicodeSet addAll(UnicodeSet c) {
90ce3da70b43 Initial load duke parents: diff changeset	811	add(c.list, c.len, 0);
90ce3da70b43 Initial load duke parents: diff changeset	812	strings.addAll(c.strings);
90ce3da70b43 Initial load duke parents: diff changeset	813	return this;
90ce3da70b43 Initial load duke parents: diff changeset	814	}
90ce3da70b43 Initial load duke parents: diff changeset	815
90ce3da70b43 Initial load duke parents: diff changeset	816	/**
90ce3da70b43 Initial load duke parents: diff changeset	817	* Retains only the elements in this set that are contained in the
90ce3da70b43 Initial load duke parents: diff changeset	818	* specified set. In other words, removes from this set all of
90ce3da70b43 Initial load duke parents: diff changeset	819	* its elements that are not contained in the specified set. This
90ce3da70b43 Initial load duke parents: diff changeset	820	* operation effectively modifies this set so that its value is
90ce3da70b43 Initial load duke parents: diff changeset	821	* the <i>intersection</i> of the two sets.
90ce3da70b43 Initial load duke parents: diff changeset	822	*
90ce3da70b43 Initial load duke parents: diff changeset	823	* @param c set that defines which elements this set will retain.
90ce3da70b43 Initial load duke parents: diff changeset	824	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	825	*/
90ce3da70b43 Initial load duke parents: diff changeset	826	public UnicodeSet retainAll(UnicodeSet c) {
90ce3da70b43 Initial load duke parents: diff changeset	827	retain(c.list, c.len, 0);
90ce3da70b43 Initial load duke parents: diff changeset	828	strings.retainAll(c.strings);
90ce3da70b43 Initial load duke parents: diff changeset	829	return this;
90ce3da70b43 Initial load duke parents: diff changeset	830	}
90ce3da70b43 Initial load duke parents: diff changeset	831
90ce3da70b43 Initial load duke parents: diff changeset	832	/**
90ce3da70b43 Initial load duke parents: diff changeset	833	* Removes from this set all of its elements that are contained in the
90ce3da70b43 Initial load duke parents: diff changeset	834	* specified set. This operation effectively modifies this
90ce3da70b43 Initial load duke parents: diff changeset	835	* set so that its value is the <i>asymmetric set difference</i> of
90ce3da70b43 Initial load duke parents: diff changeset	836	* the two sets.
90ce3da70b43 Initial load duke parents: diff changeset	837	*
90ce3da70b43 Initial load duke parents: diff changeset	838	* @param c set that defines which elements will be removed from
90ce3da70b43 Initial load duke parents: diff changeset	839	* this set.
90ce3da70b43 Initial load duke parents: diff changeset	840	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	841	*/
90ce3da70b43 Initial load duke parents: diff changeset	842	public UnicodeSet removeAll(UnicodeSet c) {
90ce3da70b43 Initial load duke parents: diff changeset	843	retain(c.list, c.len, 2);
90ce3da70b43 Initial load duke parents: diff changeset	844	strings.removeAll(c.strings);
90ce3da70b43 Initial load duke parents: diff changeset	845	return this;
90ce3da70b43 Initial load duke parents: diff changeset	846	}
90ce3da70b43 Initial load duke parents: diff changeset	847
90ce3da70b43 Initial load duke parents: diff changeset	848	/**
90ce3da70b43 Initial load duke parents: diff changeset	849	* Removes all of the elements from this set. This set will be
90ce3da70b43 Initial load duke parents: diff changeset	850	* empty after this call returns.
90ce3da70b43 Initial load duke parents: diff changeset	851	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	852	*/
90ce3da70b43 Initial load duke parents: diff changeset	853	public UnicodeSet clear() {
90ce3da70b43 Initial load duke parents: diff changeset	854	list[0] = HIGH;
90ce3da70b43 Initial load duke parents: diff changeset	855	len = 1;
90ce3da70b43 Initial load duke parents: diff changeset	856	pat = null;
90ce3da70b43 Initial load duke parents: diff changeset	857	strings.clear();
90ce3da70b43 Initial load duke parents: diff changeset	858	return this;
90ce3da70b43 Initial load duke parents: diff changeset	859	}
90ce3da70b43 Initial load duke parents: diff changeset	860
90ce3da70b43 Initial load duke parents: diff changeset	861	/**
90ce3da70b43 Initial load duke parents: diff changeset	862	* Iteration method that returns the number of ranges contained in
90ce3da70b43 Initial load duke parents: diff changeset	863	* this set.
90ce3da70b43 Initial load duke parents: diff changeset	864	* @see #getRangeStart
90ce3da70b43 Initial load duke parents: diff changeset	865	* @see #getRangeEnd
90ce3da70b43 Initial load duke parents: diff changeset	866	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	867	*/
90ce3da70b43 Initial load duke parents: diff changeset	868	public int getRangeCount() {
90ce3da70b43 Initial load duke parents: diff changeset	869	return len/2;
90ce3da70b43 Initial load duke parents: diff changeset	870	}
90ce3da70b43 Initial load duke parents: diff changeset	871
90ce3da70b43 Initial load duke parents: diff changeset	872	/**
90ce3da70b43 Initial load duke parents: diff changeset	873	* Iteration method that returns the first character in the
90ce3da70b43 Initial load duke parents: diff changeset	874	* specified range of this set.
90ce3da70b43 Initial load duke parents: diff changeset	875	* @exception ArrayIndexOutOfBoundsException if index is outside
90ce3da70b43 Initial load duke parents: diff changeset	876	* the range <code>0..getRangeCount()-1</code>
90ce3da70b43 Initial load duke parents: diff changeset	877	* @see #getRangeCount
90ce3da70b43 Initial load duke parents: diff changeset	878	* @see #getRangeEnd
90ce3da70b43 Initial load duke parents: diff changeset	879	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	880	*/
90ce3da70b43 Initial load duke parents: diff changeset	881	public int getRangeStart(int index) {
90ce3da70b43 Initial load duke parents: diff changeset	882	return list[index*2];
90ce3da70b43 Initial load duke parents: diff changeset	883	}
90ce3da70b43 Initial load duke parents: diff changeset	884
90ce3da70b43 Initial load duke parents: diff changeset	885	/**
90ce3da70b43 Initial load duke parents: diff changeset	886	* Iteration method that returns the last character in the
90ce3da70b43 Initial load duke parents: diff changeset	887	* specified range of this set.
90ce3da70b43 Initial load duke parents: diff changeset	888	* @exception ArrayIndexOutOfBoundsException if index is outside
90ce3da70b43 Initial load duke parents: diff changeset	889	* the range <code>0..getRangeCount()-1</code>
90ce3da70b43 Initial load duke parents: diff changeset	890	* @see #getRangeStart
90ce3da70b43 Initial load duke parents: diff changeset	891	* @see #getRangeEnd
90ce3da70b43 Initial load duke parents: diff changeset	892	* @stable ICU 2.0
90ce3da70b43 Initial load duke parents: diff changeset	893	*/
90ce3da70b43 Initial load duke parents: diff changeset	894	public int getRangeEnd(int index) {
90ce3da70b43 Initial load duke parents: diff changeset	895	return (list[index*2 + 1] - 1);
90ce3da70b43 Initial load duke parents: diff changeset	896	}
90ce3da70b43 Initial load duke parents: diff changeset	897
90ce3da70b43 Initial load duke parents: diff changeset	898	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	899	// Implementation: Pattern parsing
90ce3da70b43 Initial load duke parents: diff changeset	900	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	901
90ce3da70b43 Initial load duke parents: diff changeset	902	/**
90ce3da70b43 Initial load duke parents: diff changeset	903	* Parses the given pattern, starting at the given position. The character
90ce3da70b43 Initial load duke parents: diff changeset	904	* at pattern.charAt(pos.getIndex()) must be '[', or the parse fails.
90ce3da70b43 Initial load duke parents: diff changeset	905	* Parsing continues until the corresponding closing ']'. If a syntax error
90ce3da70b43 Initial load duke parents: diff changeset	906	* is encountered between the opening and closing brace, the parse fails.
90ce3da70b43 Initial load duke parents: diff changeset	907	* Upon return from a successful parse, the ParsePosition is updated to
90ce3da70b43 Initial load duke parents: diff changeset	908	* point to the character following the closing ']', and an inversion
90ce3da70b43 Initial load duke parents: diff changeset	909	* list for the parsed pattern is returned. This method
90ce3da70b43 Initial load duke parents: diff changeset	910	* calls itself recursively to parse embedded subpatterns.
90ce3da70b43 Initial load duke parents: diff changeset	911	*
90ce3da70b43 Initial load duke parents: diff changeset	912	* @param pattern the string containing the pattern to be parsed. The
90ce3da70b43 Initial load duke parents: diff changeset	913	* portion of the string from pos.getIndex(), which must be a '[', to the
90ce3da70b43 Initial load duke parents: diff changeset	914	* corresponding closing ']', is parsed.
90ce3da70b43 Initial load duke parents: diff changeset	915	* @param pos upon entry, the position at which to being parsing. The
90ce3da70b43 Initial load duke parents: diff changeset	916	* character at pattern.charAt(pos.getIndex()) must be a '['. Upon return
90ce3da70b43 Initial load duke parents: diff changeset	917	* from a successful parse, pos.getIndex() is either the character after the
90ce3da70b43 Initial load duke parents: diff changeset	918	* closing ']' of the parsed pattern, or pattern.length() if the closing ']'
90ce3da70b43 Initial load duke parents: diff changeset	919	* is the last character of the pattern string.
90ce3da70b43 Initial load duke parents: diff changeset	920	* @return an inversion list for the parsed substring
90ce3da70b43 Initial load duke parents: diff changeset	921	* of <code>pattern</code>
90ce3da70b43 Initial load duke parents: diff changeset	922	* @exception java.lang.IllegalArgumentException if the parse fails.
90ce3da70b43 Initial load duke parents: diff changeset	923	*/
90ce3da70b43 Initial load duke parents: diff changeset	924	UnicodeSet applyPattern(String pattern,
90ce3da70b43 Initial load duke parents: diff changeset	925	ParsePosition pos,
90ce3da70b43 Initial load duke parents: diff changeset	926	SymbolTable symbols,
90ce3da70b43 Initial load duke parents: diff changeset	927	int options) {
90ce3da70b43 Initial load duke parents: diff changeset	928
90ce3da70b43 Initial load duke parents: diff changeset	929	// Need to build the pattern in a temporary string because
90ce3da70b43 Initial load duke parents: diff changeset	930	// _applyPattern calls add() etc., which set pat to empty.
90ce3da70b43 Initial load duke parents: diff changeset	931	boolean parsePositionWasNull = pos == null;
90ce3da70b43 Initial load duke parents: diff changeset	932	if (parsePositionWasNull) {
90ce3da70b43 Initial load duke parents: diff changeset	933	pos = new ParsePosition(0);
90ce3da70b43 Initial load duke parents: diff changeset	934	}
90ce3da70b43 Initial load duke parents: diff changeset	935
90ce3da70b43 Initial load duke parents: diff changeset	936	StringBuffer rebuiltPat = new StringBuffer();
90ce3da70b43 Initial load duke parents: diff changeset	937	RuleCharacterIterator chars =
90ce3da70b43 Initial load duke parents: diff changeset	938	new RuleCharacterIterator(pattern, symbols, pos);
90ce3da70b43 Initial load duke parents: diff changeset	939	applyPattern(chars, symbols, rebuiltPat, options);
90ce3da70b43 Initial load duke parents: diff changeset	940	if (chars.inVariable()) {
90ce3da70b43 Initial load duke parents: diff changeset	941	syntaxError(chars, "Extra chars in variable value");
90ce3da70b43 Initial load duke parents: diff changeset	942	}
90ce3da70b43 Initial load duke parents: diff changeset	943	pat = rebuiltPat.toString();
90ce3da70b43 Initial load duke parents: diff changeset	944	if (parsePositionWasNull) {
90ce3da70b43 Initial load duke parents: diff changeset	945	int i = pos.getIndex();
90ce3da70b43 Initial load duke parents: diff changeset	946
90ce3da70b43 Initial load duke parents: diff changeset	947	// Skip over trailing whitespace
90ce3da70b43 Initial load duke parents: diff changeset	948	if ((options & IGNORE_SPACE) != 0) {
90ce3da70b43 Initial load duke parents: diff changeset	949	i = Utility.skipWhitespace(pattern, i);
90ce3da70b43 Initial load duke parents: diff changeset	950	}
90ce3da70b43 Initial load duke parents: diff changeset	951
90ce3da70b43 Initial load duke parents: diff changeset	952	if (i != pattern.length()) {
90ce3da70b43 Initial load duke parents: diff changeset	953	throw new IllegalArgumentException("Parse of \"" + pattern +
90ce3da70b43 Initial load duke parents: diff changeset	954	"\" failed at " + i);
90ce3da70b43 Initial load duke parents: diff changeset	955	}
90ce3da70b43 Initial load duke parents: diff changeset	956	}
90ce3da70b43 Initial load duke parents: diff changeset	957	return this;
90ce3da70b43 Initial load duke parents: diff changeset	958	}
90ce3da70b43 Initial load duke parents: diff changeset	959
90ce3da70b43 Initial load duke parents: diff changeset	960	/**
90ce3da70b43 Initial load duke parents: diff changeset	961	* Parse the pattern from the given RuleCharacterIterator. The
90ce3da70b43 Initial load duke parents: diff changeset	962	* iterator is advanced over the parsed pattern.
90ce3da70b43 Initial load duke parents: diff changeset	963	* @param chars iterator over the pattern characters. Upon return
90ce3da70b43 Initial load duke parents: diff changeset	964	* it will be advanced to the first character after the parsed
90ce3da70b43 Initial load duke parents: diff changeset	965	* pattern, or the end of the iteration if all characters are
90ce3da70b43 Initial load duke parents: diff changeset	966	* parsed.
90ce3da70b43 Initial load duke parents: diff changeset	967	* @param symbols symbol table to use to parse and dereference
90ce3da70b43 Initial load duke parents: diff changeset	968	* variables, or null if none.
90ce3da70b43 Initial load duke parents: diff changeset	969	* @param rebuiltPat the pattern that was parsed, rebuilt or
90ce3da70b43 Initial load duke parents: diff changeset	970	* copied from the input pattern, as appropriate.
90ce3da70b43 Initial load duke parents: diff changeset	971	* @param options a bit mask of zero or more of the following:
90ce3da70b43 Initial load duke parents: diff changeset	972	* IGNORE_SPACE, CASE.
90ce3da70b43 Initial load duke parents: diff changeset	973	*/
90ce3da70b43 Initial load duke parents: diff changeset	974	void applyPattern(RuleCharacterIterator chars, SymbolTable symbols,
90ce3da70b43 Initial load duke parents: diff changeset	975	StringBuffer rebuiltPat, int options) {
90ce3da70b43 Initial load duke parents: diff changeset	976	// Syntax characters: [ ] ^ - & { }
90ce3da70b43 Initial load duke parents: diff changeset	977
90ce3da70b43 Initial load duke parents: diff changeset	978	// Recognized special forms for chars, sets: c-c s-s s&s
90ce3da70b43 Initial load duke parents: diff changeset	979
90ce3da70b43 Initial load duke parents: diff changeset	980	int opts = RuleCharacterIterator.PARSE_VARIABLES \|
90ce3da70b43 Initial load duke parents: diff changeset	981	RuleCharacterIterator.PARSE_ESCAPES;
90ce3da70b43 Initial load duke parents: diff changeset	982	if ((options & IGNORE_SPACE) != 0) {
90ce3da70b43 Initial load duke parents: diff changeset	983	opts \|= RuleCharacterIterator.SKIP_WHITESPACE;
90ce3da70b43 Initial load duke parents: diff changeset	984	}
90ce3da70b43 Initial load duke parents: diff changeset	985
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	986	StringBuffer patBuf = new StringBuffer(), buf = null;
2 90ce3da70b43 Initial load duke parents: diff changeset	987	boolean usePat = false;
90ce3da70b43 Initial load duke parents: diff changeset	988	UnicodeSet scratch = null;
90ce3da70b43 Initial load duke parents: diff changeset	989	Object backup = null;
90ce3da70b43 Initial load duke parents: diff changeset	990
90ce3da70b43 Initial load duke parents: diff changeset	991	// mode: 0=before [, 1=between [...], 2=after ]
90ce3da70b43 Initial load duke parents: diff changeset	992	// lastItem: 0=none, 1=char, 2=set
90ce3da70b43 Initial load duke parents: diff changeset	993	int lastItem = 0, lastChar = 0, mode = 0;
90ce3da70b43 Initial load duke parents: diff changeset	994	char op = 0;
90ce3da70b43 Initial load duke parents: diff changeset	995
90ce3da70b43 Initial load duke parents: diff changeset	996	boolean invert = false;
90ce3da70b43 Initial load duke parents: diff changeset	997
90ce3da70b43 Initial load duke parents: diff changeset	998	clear();
90ce3da70b43 Initial load duke parents: diff changeset	999
90ce3da70b43 Initial load duke parents: diff changeset	1000	while (mode != 2 && !chars.atEnd()) {
90ce3da70b43 Initial load duke parents: diff changeset	1001	if (false) {
90ce3da70b43 Initial load duke parents: diff changeset	1002	// Debugging assertion
90ce3da70b43 Initial load duke parents: diff changeset	1003	if (!((lastItem == 0 && op == 0) \|\|
90ce3da70b43 Initial load duke parents: diff changeset	1004	(lastItem == 1 && (op == 0 \|\| op == '-')) \|\|
90ce3da70b43 Initial load duke parents: diff changeset	1005	(lastItem == 2 && (op == 0 \|\| op == '-' \|\| op == '&')))) {
90ce3da70b43 Initial load duke parents: diff changeset	1006	throw new IllegalArgumentException();
90ce3da70b43 Initial load duke parents: diff changeset	1007	}
90ce3da70b43 Initial load duke parents: diff changeset	1008	}
90ce3da70b43 Initial load duke parents: diff changeset	1009
90ce3da70b43 Initial load duke parents: diff changeset	1010	int c = 0;
90ce3da70b43 Initial load duke parents: diff changeset	1011	boolean literal = false;
90ce3da70b43 Initial load duke parents: diff changeset	1012	UnicodeSet nested = null;
90ce3da70b43 Initial load duke parents: diff changeset	1013
90ce3da70b43 Initial load duke parents: diff changeset	1014	// -------- Check for property pattern
90ce3da70b43 Initial load duke parents: diff changeset	1015
90ce3da70b43 Initial load duke parents: diff changeset	1016	// setMode: 0=none, 1=unicodeset, 2=propertypat, 3=preparsed
90ce3da70b43 Initial load duke parents: diff changeset	1017	int setMode = 0;
90ce3da70b43 Initial load duke parents: diff changeset	1018	if (resemblesPropertyPattern(chars, opts)) {
90ce3da70b43 Initial load duke parents: diff changeset	1019	setMode = 2;
90ce3da70b43 Initial load duke parents: diff changeset	1020	}
90ce3da70b43 Initial load duke parents: diff changeset	1021
90ce3da70b43 Initial load duke parents: diff changeset	1022	// -------- Parse '[' of opening delimiter OR nested set.
90ce3da70b43 Initial load duke parents: diff changeset	1023	// If there is a nested set, use `setMode' to define how
90ce3da70b43 Initial load duke parents: diff changeset	1024	// the set should be parsed. If the '[' is part of the
90ce3da70b43 Initial load duke parents: diff changeset	1025	// opening delimiter for this pattern, parse special
90ce3da70b43 Initial load duke parents: diff changeset	1026	// strings "[", "[^", "[-", and "[^-". Check for stand-in
90ce3da70b43 Initial load duke parents: diff changeset	1027	// characters representing a nested set in the symbol
90ce3da70b43 Initial load duke parents: diff changeset	1028	// table.
90ce3da70b43 Initial load duke parents: diff changeset	1029
90ce3da70b43 Initial load duke parents: diff changeset	1030	else {
90ce3da70b43 Initial load duke parents: diff changeset	1031	// Prepare to backup if necessary
90ce3da70b43 Initial load duke parents: diff changeset	1032	backup = chars.getPos(backup);
90ce3da70b43 Initial load duke parents: diff changeset	1033	c = chars.next(opts);
90ce3da70b43 Initial load duke parents: diff changeset	1034	literal = chars.isEscaped();
90ce3da70b43 Initial load duke parents: diff changeset	1035
90ce3da70b43 Initial load duke parents: diff changeset	1036	if (c == '[' && !literal) {
90ce3da70b43 Initial load duke parents: diff changeset	1037	if (mode == 1) {
90ce3da70b43 Initial load duke parents: diff changeset	1038	chars.setPos(backup); // backup
90ce3da70b43 Initial load duke parents: diff changeset	1039	setMode = 1;
90ce3da70b43 Initial load duke parents: diff changeset	1040	} else {
90ce3da70b43 Initial load duke parents: diff changeset	1041	// Handle opening '[' delimiter
90ce3da70b43 Initial load duke parents: diff changeset	1042	mode = 1;
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1043	patBuf.append('[');
2 90ce3da70b43 Initial load duke parents: diff changeset	1044	backup = chars.getPos(backup); // prepare to backup
90ce3da70b43 Initial load duke parents: diff changeset	1045	c = chars.next(opts);
90ce3da70b43 Initial load duke parents: diff changeset	1046	literal = chars.isEscaped();
90ce3da70b43 Initial load duke parents: diff changeset	1047	if (c == '^' && !literal) {
90ce3da70b43 Initial load duke parents: diff changeset	1048	invert = true;
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1049	patBuf.append('^');
2 90ce3da70b43 Initial load duke parents: diff changeset	1050	backup = chars.getPos(backup); // prepare to backup
90ce3da70b43 Initial load duke parents: diff changeset	1051	c = chars.next(opts);
90ce3da70b43 Initial load duke parents: diff changeset	1052	literal = chars.isEscaped();
90ce3da70b43 Initial load duke parents: diff changeset	1053	}
90ce3da70b43 Initial load duke parents: diff changeset	1054	// Fall through to handle special leading '-';
90ce3da70b43 Initial load duke parents: diff changeset	1055	// otherwise restart loop for nested [], \p{}, etc.
90ce3da70b43 Initial load duke parents: diff changeset	1056	if (c == '-') {
90ce3da70b43 Initial load duke parents: diff changeset	1057	literal = true;
90ce3da70b43 Initial load duke parents: diff changeset	1058	// Fall through to handle literal '-' below
90ce3da70b43 Initial load duke parents: diff changeset	1059	} else {
90ce3da70b43 Initial load duke parents: diff changeset	1060	chars.setPos(backup); // backup
90ce3da70b43 Initial load duke parents: diff changeset	1061	continue;
90ce3da70b43 Initial load duke parents: diff changeset	1062	}
90ce3da70b43 Initial load duke parents: diff changeset	1063	}
90ce3da70b43 Initial load duke parents: diff changeset	1064	} else if (symbols != null) {
90ce3da70b43 Initial load duke parents: diff changeset	1065	UnicodeMatcher m = symbols.lookupMatcher(c); // may be null
90ce3da70b43 Initial load duke parents: diff changeset	1066	if (m != null) {
90ce3da70b43 Initial load duke parents: diff changeset	1067	try {
90ce3da70b43 Initial load duke parents: diff changeset	1068	nested = (UnicodeSet) m;
90ce3da70b43 Initial load duke parents: diff changeset	1069	setMode = 3;
90ce3da70b43 Initial load duke parents: diff changeset	1070	} catch (ClassCastException e) {
90ce3da70b43 Initial load duke parents: diff changeset	1071	syntaxError(chars, "Syntax error");
90ce3da70b43 Initial load duke parents: diff changeset	1072	}
90ce3da70b43 Initial load duke parents: diff changeset	1073	}
90ce3da70b43 Initial load duke parents: diff changeset	1074	}
90ce3da70b43 Initial load duke parents: diff changeset	1075	}
90ce3da70b43 Initial load duke parents: diff changeset	1076
90ce3da70b43 Initial load duke parents: diff changeset	1077	// -------- Handle a nested set. This either is inline in
90ce3da70b43 Initial load duke parents: diff changeset	1078	// the pattern or represented by a stand-in that has
90ce3da70b43 Initial load duke parents: diff changeset	1079	// previously been parsed and was looked up in the symbol
90ce3da70b43 Initial load duke parents: diff changeset	1080	// table.
90ce3da70b43 Initial load duke parents: diff changeset	1081
90ce3da70b43 Initial load duke parents: diff changeset	1082	if (setMode != 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1083	if (lastItem == 1) {
90ce3da70b43 Initial load duke parents: diff changeset	1084	if (op != 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1085	syntaxError(chars, "Char expected after operator");
90ce3da70b43 Initial load duke parents: diff changeset	1086	}
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1087	add_unchecked(lastChar, lastChar);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1088	_appendToPat(patBuf, lastChar, false);
2 90ce3da70b43 Initial load duke parents: diff changeset	1089	lastItem = op = 0;
90ce3da70b43 Initial load duke parents: diff changeset	1090	}
90ce3da70b43 Initial load duke parents: diff changeset	1091
90ce3da70b43 Initial load duke parents: diff changeset	1092	if (op == '-' \|\| op == '&') {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1093	patBuf.append(op);
2 90ce3da70b43 Initial load duke parents: diff changeset	1094	}
90ce3da70b43 Initial load duke parents: diff changeset	1095
90ce3da70b43 Initial load duke parents: diff changeset	1096	if (nested == null) {
90ce3da70b43 Initial load duke parents: diff changeset	1097	if (scratch == null) scratch = new UnicodeSet();
90ce3da70b43 Initial load duke parents: diff changeset	1098	nested = scratch;
90ce3da70b43 Initial load duke parents: diff changeset	1099	}
90ce3da70b43 Initial load duke parents: diff changeset	1100	switch (setMode) {
90ce3da70b43 Initial load duke parents: diff changeset	1101	case 1:
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1102	nested.applyPattern(chars, symbols, patBuf, options);
2 90ce3da70b43 Initial load duke parents: diff changeset	1103	break;
90ce3da70b43 Initial load duke parents: diff changeset	1104	case 2:
90ce3da70b43 Initial load duke parents: diff changeset	1105	chars.skipIgnored(opts);
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1106	nested.applyPropertyPattern(chars, patBuf, symbols);
2 90ce3da70b43 Initial load duke parents: diff changeset	1107	break;
90ce3da70b43 Initial load duke parents: diff changeset	1108	case 3: // `nested' already parsed
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1109	nested._toPattern(patBuf, false);
2 90ce3da70b43 Initial load duke parents: diff changeset	1110	break;
90ce3da70b43 Initial load duke parents: diff changeset	1111	}
90ce3da70b43 Initial load duke parents: diff changeset	1112
90ce3da70b43 Initial load duke parents: diff changeset	1113	usePat = true;
90ce3da70b43 Initial load duke parents: diff changeset	1114
90ce3da70b43 Initial load duke parents: diff changeset	1115	if (mode == 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1116	// Entire pattern is a category; leave parse loop
90ce3da70b43 Initial load duke parents: diff changeset	1117	set(nested);
90ce3da70b43 Initial load duke parents: diff changeset	1118	mode = 2;
90ce3da70b43 Initial load duke parents: diff changeset	1119	break;
90ce3da70b43 Initial load duke parents: diff changeset	1120	}
90ce3da70b43 Initial load duke parents: diff changeset	1121
90ce3da70b43 Initial load duke parents: diff changeset	1122	switch (op) {
90ce3da70b43 Initial load duke parents: diff changeset	1123	case '-':
90ce3da70b43 Initial load duke parents: diff changeset	1124	removeAll(nested);
90ce3da70b43 Initial load duke parents: diff changeset	1125	break;
90ce3da70b43 Initial load duke parents: diff changeset	1126	case '&':
90ce3da70b43 Initial load duke parents: diff changeset	1127	retainAll(nested);
90ce3da70b43 Initial load duke parents: diff changeset	1128	break;
90ce3da70b43 Initial load duke parents: diff changeset	1129	case 0:
90ce3da70b43 Initial load duke parents: diff changeset	1130	addAll(nested);
90ce3da70b43 Initial load duke parents: diff changeset	1131	break;
90ce3da70b43 Initial load duke parents: diff changeset	1132	}
90ce3da70b43 Initial load duke parents: diff changeset	1133
90ce3da70b43 Initial load duke parents: diff changeset	1134	op = 0;
90ce3da70b43 Initial load duke parents: diff changeset	1135	lastItem = 2;
90ce3da70b43 Initial load duke parents: diff changeset	1136
90ce3da70b43 Initial load duke parents: diff changeset	1137	continue;
90ce3da70b43 Initial load duke parents: diff changeset	1138	}
90ce3da70b43 Initial load duke parents: diff changeset	1139
90ce3da70b43 Initial load duke parents: diff changeset	1140	if (mode == 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1141	syntaxError(chars, "Missing '['");
90ce3da70b43 Initial load duke parents: diff changeset	1142	}
90ce3da70b43 Initial load duke parents: diff changeset	1143
90ce3da70b43 Initial load duke parents: diff changeset	1144	// -------- Parse special (syntax) characters. If the
90ce3da70b43 Initial load duke parents: diff changeset	1145	// current character is not special, or if it is escaped,
90ce3da70b43 Initial load duke parents: diff changeset	1146	// then fall through and handle it below.
90ce3da70b43 Initial load duke parents: diff changeset	1147
90ce3da70b43 Initial load duke parents: diff changeset	1148	if (!literal) {
90ce3da70b43 Initial load duke parents: diff changeset	1149	switch (c) {
90ce3da70b43 Initial load duke parents: diff changeset	1150	case ']':
90ce3da70b43 Initial load duke parents: diff changeset	1151	if (lastItem == 1) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1152	add_unchecked(lastChar, lastChar);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1153	_appendToPat(patBuf, lastChar, false);
2 90ce3da70b43 Initial load duke parents: diff changeset	1154	}
90ce3da70b43 Initial load duke parents: diff changeset	1155	// Treat final trailing '-' as a literal
90ce3da70b43 Initial load duke parents: diff changeset	1156	if (op == '-') {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1157	add_unchecked(op, op);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1158	patBuf.append(op);
2 90ce3da70b43 Initial load duke parents: diff changeset	1159	} else if (op == '&') {
90ce3da70b43 Initial load duke parents: diff changeset	1160	syntaxError(chars, "Trailing '&'");
90ce3da70b43 Initial load duke parents: diff changeset	1161	}
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1162	patBuf.append(']');
2 90ce3da70b43 Initial load duke parents: diff changeset	1163	mode = 2;
90ce3da70b43 Initial load duke parents: diff changeset	1164	continue;
90ce3da70b43 Initial load duke parents: diff changeset	1165	case '-':
90ce3da70b43 Initial load duke parents: diff changeset	1166	if (op == 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1167	if (lastItem != 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1168	op = (char) c;
90ce3da70b43 Initial load duke parents: diff changeset	1169	continue;
90ce3da70b43 Initial load duke parents: diff changeset	1170	} else {
90ce3da70b43 Initial load duke parents: diff changeset	1171	// Treat final trailing '-' as a literal
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1172	add_unchecked(c, c);
2 90ce3da70b43 Initial load duke parents: diff changeset	1173	c = chars.next(opts);
90ce3da70b43 Initial load duke parents: diff changeset	1174	literal = chars.isEscaped();
90ce3da70b43 Initial load duke parents: diff changeset	1175	if (c == ']' && !literal) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1176	patBuf.append("-]");
2 90ce3da70b43 Initial load duke parents: diff changeset	1177	mode = 2;
90ce3da70b43 Initial load duke parents: diff changeset	1178	continue;
90ce3da70b43 Initial load duke parents: diff changeset	1179	}
90ce3da70b43 Initial load duke parents: diff changeset	1180	}
90ce3da70b43 Initial load duke parents: diff changeset	1181	}
90ce3da70b43 Initial load duke parents: diff changeset	1182	syntaxError(chars, "'-' not after char or set");
11136 f0f53bbe5bd1 7116914: Miscellaneous warnings (sun.text) peytoia parents: 5506 diff changeset	1183	break;
2 90ce3da70b43 Initial load duke parents: diff changeset	1184	case '&':
90ce3da70b43 Initial load duke parents: diff changeset	1185	if (lastItem == 2 && op == 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1186	op = (char) c;
90ce3da70b43 Initial load duke parents: diff changeset	1187	continue;
90ce3da70b43 Initial load duke parents: diff changeset	1188	}
90ce3da70b43 Initial load duke parents: diff changeset	1189	syntaxError(chars, "'&' not after set");
11136 f0f53bbe5bd1 7116914: Miscellaneous warnings (sun.text) peytoia parents: 5506 diff changeset	1190	break;
2 90ce3da70b43 Initial load duke parents: diff changeset	1191	case '^':
90ce3da70b43 Initial load duke parents: diff changeset	1192	syntaxError(chars, "'^' not after '['");
11136 f0f53bbe5bd1 7116914: Miscellaneous warnings (sun.text) peytoia parents: 5506 diff changeset	1193	break;
2 90ce3da70b43 Initial load duke parents: diff changeset	1194	case '{':
90ce3da70b43 Initial load duke parents: diff changeset	1195	if (op != 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1196	syntaxError(chars, "Missing operand after operator");
90ce3da70b43 Initial load duke parents: diff changeset	1197	}
90ce3da70b43 Initial load duke parents: diff changeset	1198	if (lastItem == 1) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1199	add_unchecked(lastChar, lastChar);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1200	_appendToPat(patBuf, lastChar, false);
2 90ce3da70b43 Initial load duke parents: diff changeset	1201	}
90ce3da70b43 Initial load duke parents: diff changeset	1202	lastItem = 0;
90ce3da70b43 Initial load duke parents: diff changeset	1203	if (buf == null) {
90ce3da70b43 Initial load duke parents: diff changeset	1204	buf = new StringBuffer();
90ce3da70b43 Initial load duke parents: diff changeset	1205	} else {
90ce3da70b43 Initial load duke parents: diff changeset	1206	buf.setLength(0);
90ce3da70b43 Initial load duke parents: diff changeset	1207	}
90ce3da70b43 Initial load duke parents: diff changeset	1208	boolean ok = false;
90ce3da70b43 Initial load duke parents: diff changeset	1209	while (!chars.atEnd()) {
90ce3da70b43 Initial load duke parents: diff changeset	1210	c = chars.next(opts);
90ce3da70b43 Initial load duke parents: diff changeset	1211	literal = chars.isEscaped();
90ce3da70b43 Initial load duke parents: diff changeset	1212	if (c == '}' && !literal) {
90ce3da70b43 Initial load duke parents: diff changeset	1213	ok = true;
90ce3da70b43 Initial load duke parents: diff changeset	1214	break;
90ce3da70b43 Initial load duke parents: diff changeset	1215	}
90ce3da70b43 Initial load duke parents: diff changeset	1216	UTF16.append(buf, c);
90ce3da70b43 Initial load duke parents: diff changeset	1217	}
90ce3da70b43 Initial load duke parents: diff changeset	1218	if (buf.length() < 1 \|\| !ok) {
90ce3da70b43 Initial load duke parents: diff changeset	1219	syntaxError(chars, "Invalid multicharacter string");
90ce3da70b43 Initial load duke parents: diff changeset	1220	}
90ce3da70b43 Initial load duke parents: diff changeset	1221	// We have new string. Add it to set and continue;
90ce3da70b43 Initial load duke parents: diff changeset	1222	// we don't need to drop through to the further
90ce3da70b43 Initial load duke parents: diff changeset	1223	// processing
90ce3da70b43 Initial load duke parents: diff changeset	1224	add(buf.toString());
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1225	patBuf.append('{');
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1226	_appendToPat(patBuf, buf.toString(), false);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1227	patBuf.append('}');
2 90ce3da70b43 Initial load duke parents: diff changeset	1228	continue;
90ce3da70b43 Initial load duke parents: diff changeset	1229	case SymbolTable.SYMBOL_REF:
90ce3da70b43 Initial load duke parents: diff changeset	1230	// symbols nosymbols
90ce3da70b43 Initial load duke parents: diff changeset	1231	// [a-$] error error (ambiguous)
90ce3da70b43 Initial load duke parents: diff changeset	1232	// [a$] anchor anchor
90ce3da70b43 Initial load duke parents: diff changeset	1233	// [a-$x] var "x"* literal '$'
90ce3da70b43 Initial load duke parents: diff changeset	1234	// [a-$.] error literal '$'
90ce3da70b43 Initial load duke parents: diff changeset	1235	// *We won't get here in the case of var "x"
90ce3da70b43 Initial load duke parents: diff changeset	1236	backup = chars.getPos(backup);
90ce3da70b43 Initial load duke parents: diff changeset	1237	c = chars.next(opts);
90ce3da70b43 Initial load duke parents: diff changeset	1238	literal = chars.isEscaped();
90ce3da70b43 Initial load duke parents: diff changeset	1239	boolean anchor = (c == ']' && !literal);
90ce3da70b43 Initial load duke parents: diff changeset	1240	if (symbols == null && !anchor) {
90ce3da70b43 Initial load duke parents: diff changeset	1241	c = SymbolTable.SYMBOL_REF;
90ce3da70b43 Initial load duke parents: diff changeset	1242	chars.setPos(backup);
90ce3da70b43 Initial load duke parents: diff changeset	1243	break; // literal '$'
90ce3da70b43 Initial load duke parents: diff changeset	1244	}
90ce3da70b43 Initial load duke parents: diff changeset	1245	if (anchor && op == 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1246	if (lastItem == 1) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1247	add_unchecked(lastChar, lastChar);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1248	_appendToPat(patBuf, lastChar, false);
2 90ce3da70b43 Initial load duke parents: diff changeset	1249	}
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1250	add_unchecked(UnicodeMatcher.ETHER);
2 90ce3da70b43 Initial load duke parents: diff changeset	1251	usePat = true;
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1252	patBuf.append(SymbolTable.SYMBOL_REF).append(']');
2 90ce3da70b43 Initial load duke parents: diff changeset	1253	mode = 2;
90ce3da70b43 Initial load duke parents: diff changeset	1254	continue;
90ce3da70b43 Initial load duke parents: diff changeset	1255	}
90ce3da70b43 Initial load duke parents: diff changeset	1256	syntaxError(chars, "Unquoted '$'");
11136 f0f53bbe5bd1 7116914: Miscellaneous warnings (sun.text) peytoia parents: 5506 diff changeset	1257	break;
2 90ce3da70b43 Initial load duke parents: diff changeset	1258	default:
90ce3da70b43 Initial load duke parents: diff changeset	1259	break;
90ce3da70b43 Initial load duke parents: diff changeset	1260	}
90ce3da70b43 Initial load duke parents: diff changeset	1261	}
90ce3da70b43 Initial load duke parents: diff changeset	1262
90ce3da70b43 Initial load duke parents: diff changeset	1263	// -------- Parse literal characters. This includes both
90ce3da70b43 Initial load duke parents: diff changeset	1264	// escaped chars ("\u4E01") and non-syntax characters
90ce3da70b43 Initial load duke parents: diff changeset	1265	// ("a").
90ce3da70b43 Initial load duke parents: diff changeset	1266
90ce3da70b43 Initial load duke parents: diff changeset	1267	switch (lastItem) {
90ce3da70b43 Initial load duke parents: diff changeset	1268	case 0:
90ce3da70b43 Initial load duke parents: diff changeset	1269	lastItem = 1;
90ce3da70b43 Initial load duke parents: diff changeset	1270	lastChar = c;
90ce3da70b43 Initial load duke parents: diff changeset	1271	break;
90ce3da70b43 Initial load duke parents: diff changeset	1272	case 1:
90ce3da70b43 Initial load duke parents: diff changeset	1273	if (op == '-') {
90ce3da70b43 Initial load duke parents: diff changeset	1274	if (lastChar >= c) {
90ce3da70b43 Initial load duke parents: diff changeset	1275	// Don't allow redundant (a-a) or empty (b-a) ranges;
90ce3da70b43 Initial load duke parents: diff changeset	1276	// these are most likely typos.
90ce3da70b43 Initial load duke parents: diff changeset	1277	syntaxError(chars, "Invalid range");
90ce3da70b43 Initial load duke parents: diff changeset	1278	}
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1279	add_unchecked(lastChar, c);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1280	_appendToPat(patBuf, lastChar, false);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1281	patBuf.append(op);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1282	_appendToPat(patBuf, c, false);
2 90ce3da70b43 Initial load duke parents: diff changeset	1283	lastItem = op = 0;
90ce3da70b43 Initial load duke parents: diff changeset	1284	} else {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1285	add_unchecked(lastChar, lastChar);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1286	_appendToPat(patBuf, lastChar, false);
2 90ce3da70b43 Initial load duke parents: diff changeset	1287	lastChar = c;
90ce3da70b43 Initial load duke parents: diff changeset	1288	}
90ce3da70b43 Initial load duke parents: diff changeset	1289	break;
90ce3da70b43 Initial load duke parents: diff changeset	1290	case 2:
90ce3da70b43 Initial load duke parents: diff changeset	1291	if (op != 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1292	syntaxError(chars, "Set expected after operator");
90ce3da70b43 Initial load duke parents: diff changeset	1293	}
90ce3da70b43 Initial load duke parents: diff changeset	1294	lastChar = c;
90ce3da70b43 Initial load duke parents: diff changeset	1295	lastItem = 1;
90ce3da70b43 Initial load duke parents: diff changeset	1296	break;
90ce3da70b43 Initial load duke parents: diff changeset	1297	}
90ce3da70b43 Initial load duke parents: diff changeset	1298	}
90ce3da70b43 Initial load duke parents: diff changeset	1299
90ce3da70b43 Initial load duke parents: diff changeset	1300	if (mode != 2) {
90ce3da70b43 Initial load duke parents: diff changeset	1301	syntaxError(chars, "Missing ']'");
90ce3da70b43 Initial load duke parents: diff changeset	1302	}
90ce3da70b43 Initial load duke parents: diff changeset	1303
90ce3da70b43 Initial load duke parents: diff changeset	1304	chars.skipIgnored(opts);
90ce3da70b43 Initial load duke parents: diff changeset	1305
90ce3da70b43 Initial load duke parents: diff changeset	1306	if (invert) {
90ce3da70b43 Initial load duke parents: diff changeset	1307	complement();
90ce3da70b43 Initial load duke parents: diff changeset	1308	}
90ce3da70b43 Initial load duke parents: diff changeset	1309
90ce3da70b43 Initial load duke parents: diff changeset	1310	// Use the rebuilt pattern (pat) only if necessary. Prefer the
90ce3da70b43 Initial load duke parents: diff changeset	1311	// generated pattern.
90ce3da70b43 Initial load duke parents: diff changeset	1312	if (usePat) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1313	rebuiltPat.append(patBuf.toString());
2 90ce3da70b43 Initial load duke parents: diff changeset	1314	} else {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1315	_generatePattern(rebuiltPat, false, true);
2 90ce3da70b43 Initial load duke parents: diff changeset	1316	}
90ce3da70b43 Initial load duke parents: diff changeset	1317	}
90ce3da70b43 Initial load duke parents: diff changeset	1318
90ce3da70b43 Initial load duke parents: diff changeset	1319	private static void syntaxError(RuleCharacterIterator chars, String msg) {
90ce3da70b43 Initial load duke parents: diff changeset	1320	throw new IllegalArgumentException("Error: " + msg + " at \"" +
90ce3da70b43 Initial load duke parents: diff changeset	1321	Utility.escape(chars.toString()) +
90ce3da70b43 Initial load duke parents: diff changeset	1322	'"');
90ce3da70b43 Initial load duke parents: diff changeset	1323	}
90ce3da70b43 Initial load duke parents: diff changeset	1324
90ce3da70b43 Initial load duke parents: diff changeset	1325	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	1326	// Implementation: Utility methods
90ce3da70b43 Initial load duke parents: diff changeset	1327	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	1328
90ce3da70b43 Initial load duke parents: diff changeset	1329	private void ensureCapacity(int newLen) {
90ce3da70b43 Initial load duke parents: diff changeset	1330	if (newLen <= list.length) return;
90ce3da70b43 Initial load duke parents: diff changeset	1331	int[] temp = new int[newLen + GROW_EXTRA];
90ce3da70b43 Initial load duke parents: diff changeset	1332	System.arraycopy(list, 0, temp, 0, len);
90ce3da70b43 Initial load duke parents: diff changeset	1333	list = temp;
90ce3da70b43 Initial load duke parents: diff changeset	1334	}
90ce3da70b43 Initial load duke parents: diff changeset	1335
90ce3da70b43 Initial load duke parents: diff changeset	1336	private void ensureBufferCapacity(int newLen) {
90ce3da70b43 Initial load duke parents: diff changeset	1337	if (buffer != null && newLen <= buffer.length) return;
90ce3da70b43 Initial load duke parents: diff changeset	1338	buffer = new int[newLen + GROW_EXTRA];
90ce3da70b43 Initial load duke parents: diff changeset	1339	}
90ce3da70b43 Initial load duke parents: diff changeset	1340
90ce3da70b43 Initial load duke parents: diff changeset	1341	/**
90ce3da70b43 Initial load duke parents: diff changeset	1342	* Assumes start <= end.
90ce3da70b43 Initial load duke parents: diff changeset	1343	*/
90ce3da70b43 Initial load duke parents: diff changeset	1344	private int[] range(int start, int end) {
90ce3da70b43 Initial load duke parents: diff changeset	1345	if (rangeList == null) {
90ce3da70b43 Initial load duke parents: diff changeset	1346	rangeList = new int[] { start, end+1, HIGH };
90ce3da70b43 Initial load duke parents: diff changeset	1347	} else {
90ce3da70b43 Initial load duke parents: diff changeset	1348	rangeList[0] = start;
90ce3da70b43 Initial load duke parents: diff changeset	1349	rangeList[1] = end+1;
90ce3da70b43 Initial load duke parents: diff changeset	1350	}
90ce3da70b43 Initial load duke parents: diff changeset	1351	return rangeList;
90ce3da70b43 Initial load duke parents: diff changeset	1352	}
90ce3da70b43 Initial load duke parents: diff changeset	1353
90ce3da70b43 Initial load duke parents: diff changeset	1354	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	1355	// Implementation: Fundamental operations
90ce3da70b43 Initial load duke parents: diff changeset	1356	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	1357
90ce3da70b43 Initial load duke parents: diff changeset	1358	// polarity = 0, 3 is normal: x xor y
90ce3da70b43 Initial load duke parents: diff changeset	1359	// polarity = 1, 2: x xor ~y == x === y
90ce3da70b43 Initial load duke parents: diff changeset	1360
90ce3da70b43 Initial load duke parents: diff changeset	1361	private UnicodeSet xor(int[] other, int otherLen, int polarity) {
90ce3da70b43 Initial load duke parents: diff changeset	1362	ensureBufferCapacity(len + otherLen);
90ce3da70b43 Initial load duke parents: diff changeset	1363	int i = 0, j = 0, k = 0;
90ce3da70b43 Initial load duke parents: diff changeset	1364	int a = list[i++];
90ce3da70b43 Initial load duke parents: diff changeset	1365	int b;
90ce3da70b43 Initial load duke parents: diff changeset	1366	if (polarity == 1 \|\| polarity == 2) {
90ce3da70b43 Initial load duke parents: diff changeset	1367	b = LOW;
90ce3da70b43 Initial load duke parents: diff changeset	1368	if (other[j] == LOW) { // skip base if already LOW
90ce3da70b43 Initial load duke parents: diff changeset	1369	++j;
90ce3da70b43 Initial load duke parents: diff changeset	1370	b = other[j];
90ce3da70b43 Initial load duke parents: diff changeset	1371	}
90ce3da70b43 Initial load duke parents: diff changeset	1372	} else {
90ce3da70b43 Initial load duke parents: diff changeset	1373	b = other[j++];
90ce3da70b43 Initial load duke parents: diff changeset	1374	}
90ce3da70b43 Initial load duke parents: diff changeset	1375	// simplest of all the routines
90ce3da70b43 Initial load duke parents: diff changeset	1376	// sort the values, discarding identicals!
90ce3da70b43 Initial load duke parents: diff changeset	1377	while (true) {
90ce3da70b43 Initial load duke parents: diff changeset	1378	if (a < b) {
90ce3da70b43 Initial load duke parents: diff changeset	1379	buffer[k++] = a;
90ce3da70b43 Initial load duke parents: diff changeset	1380	a = list[i++];
90ce3da70b43 Initial load duke parents: diff changeset	1381	} else if (b < a) {
90ce3da70b43 Initial load duke parents: diff changeset	1382	buffer[k++] = b;
90ce3da70b43 Initial load duke parents: diff changeset	1383	b = other[j++];
90ce3da70b43 Initial load duke parents: diff changeset	1384	} else if (a != HIGH) { // at this point, a == b
90ce3da70b43 Initial load duke parents: diff changeset	1385	// discard both values!
90ce3da70b43 Initial load duke parents: diff changeset	1386	a = list[i++];
90ce3da70b43 Initial load duke parents: diff changeset	1387	b = other[j++];
90ce3da70b43 Initial load duke parents: diff changeset	1388	} else { // DONE!
90ce3da70b43 Initial load duke parents: diff changeset	1389	buffer[k++] = HIGH;
90ce3da70b43 Initial load duke parents: diff changeset	1390	len = k;
90ce3da70b43 Initial load duke parents: diff changeset	1391	break;
90ce3da70b43 Initial load duke parents: diff changeset	1392	}
90ce3da70b43 Initial load duke parents: diff changeset	1393	}
90ce3da70b43 Initial load duke parents: diff changeset	1394	// swap list and buffer
90ce3da70b43 Initial load duke parents: diff changeset	1395	int[] temp = list;
90ce3da70b43 Initial load duke parents: diff changeset	1396	list = buffer;
90ce3da70b43 Initial load duke parents: diff changeset	1397	buffer = temp;
90ce3da70b43 Initial load duke parents: diff changeset	1398	pat = null;
90ce3da70b43 Initial load duke parents: diff changeset	1399	return this;
90ce3da70b43 Initial load duke parents: diff changeset	1400	}
90ce3da70b43 Initial load duke parents: diff changeset	1401
90ce3da70b43 Initial load duke parents: diff changeset	1402	// polarity = 0 is normal: x union y
90ce3da70b43 Initial load duke parents: diff changeset	1403	// polarity = 2: x union ~y
90ce3da70b43 Initial load duke parents: diff changeset	1404	// polarity = 1: ~x union y
90ce3da70b43 Initial load duke parents: diff changeset	1405	// polarity = 3: ~x union ~y
90ce3da70b43 Initial load duke parents: diff changeset	1406
90ce3da70b43 Initial load duke parents: diff changeset	1407	private UnicodeSet add(int[] other, int otherLen, int polarity) {
90ce3da70b43 Initial load duke parents: diff changeset	1408	ensureBufferCapacity(len + otherLen);
90ce3da70b43 Initial load duke parents: diff changeset	1409	int i = 0, j = 0, k = 0;
90ce3da70b43 Initial load duke parents: diff changeset	1410	int a = list[i++];
90ce3da70b43 Initial load duke parents: diff changeset	1411	int b = other[j++];
90ce3da70b43 Initial load duke parents: diff changeset	1412	// change from xor is that we have to check overlapping pairs
90ce3da70b43 Initial load duke parents: diff changeset	1413	// polarity bit 1 means a is second, bit 2 means b is.
90ce3da70b43 Initial load duke parents: diff changeset	1414	main:
90ce3da70b43 Initial load duke parents: diff changeset	1415	while (true) {
90ce3da70b43 Initial load duke parents: diff changeset	1416	switch (polarity) {
90ce3da70b43 Initial load duke parents: diff changeset	1417	case 0: // both first; take lower if unequal
90ce3da70b43 Initial load duke parents: diff changeset	1418	if (a < b) { // take a
90ce3da70b43 Initial load duke parents: diff changeset	1419	// Back up over overlapping ranges in buffer[]
90ce3da70b43 Initial load duke parents: diff changeset	1420	if (k > 0 && a <= buffer[k-1]) {
90ce3da70b43 Initial load duke parents: diff changeset	1421	// Pick latter end value in buffer[] vs. list[]
90ce3da70b43 Initial load duke parents: diff changeset	1422	a = max(list[i], buffer[--k]);
90ce3da70b43 Initial load duke parents: diff changeset	1423	} else {
90ce3da70b43 Initial load duke parents: diff changeset	1424	// No overlap
90ce3da70b43 Initial load duke parents: diff changeset	1425	buffer[k++] = a;
90ce3da70b43 Initial load duke parents: diff changeset	1426	a = list[i];
90ce3da70b43 Initial load duke parents: diff changeset	1427	}
90ce3da70b43 Initial load duke parents: diff changeset	1428	i++; // Common if/else code factored out
90ce3da70b43 Initial load duke parents: diff changeset	1429	polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1430	} else if (b < a) { // take b
90ce3da70b43 Initial load duke parents: diff changeset	1431	if (k > 0 && b <= buffer[k-1]) {
90ce3da70b43 Initial load duke parents: diff changeset	1432	b = max(other[j], buffer[--k]);
90ce3da70b43 Initial load duke parents: diff changeset	1433	} else {
90ce3da70b43 Initial load duke parents: diff changeset	1434	buffer[k++] = b;
90ce3da70b43 Initial load duke parents: diff changeset	1435	b = other[j];
90ce3da70b43 Initial load duke parents: diff changeset	1436	}
90ce3da70b43 Initial load duke parents: diff changeset	1437	j++;
90ce3da70b43 Initial load duke parents: diff changeset	1438	polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1439	} else { // a == b, take a, drop b
90ce3da70b43 Initial load duke parents: diff changeset	1440	if (a == HIGH) break main;
90ce3da70b43 Initial load duke parents: diff changeset	1441	// This is symmetrical; it doesn't matter if
90ce3da70b43 Initial load duke parents: diff changeset	1442	// we backtrack with a or b. - liu
90ce3da70b43 Initial load duke parents: diff changeset	1443	if (k > 0 && a <= buffer[k-1]) {
90ce3da70b43 Initial load duke parents: diff changeset	1444	a = max(list[i], buffer[--k]);
90ce3da70b43 Initial load duke parents: diff changeset	1445	} else {
90ce3da70b43 Initial load duke parents: diff changeset	1446	// No overlap
90ce3da70b43 Initial load duke parents: diff changeset	1447	buffer[k++] = a;
90ce3da70b43 Initial load duke parents: diff changeset	1448	a = list[i];
90ce3da70b43 Initial load duke parents: diff changeset	1449	}
90ce3da70b43 Initial load duke parents: diff changeset	1450	i++;
90ce3da70b43 Initial load duke parents: diff changeset	1451	polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1452	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1453	}
90ce3da70b43 Initial load duke parents: diff changeset	1454	break;
90ce3da70b43 Initial load duke parents: diff changeset	1455	case 3: // both second; take higher if unequal, and drop other
90ce3da70b43 Initial load duke parents: diff changeset	1456	if (b <= a) { // take a
90ce3da70b43 Initial load duke parents: diff changeset	1457	if (a == HIGH) break main;
90ce3da70b43 Initial load duke parents: diff changeset	1458	buffer[k++] = a;
90ce3da70b43 Initial load duke parents: diff changeset	1459	} else { // take b
90ce3da70b43 Initial load duke parents: diff changeset	1460	if (b == HIGH) break main;
90ce3da70b43 Initial load duke parents: diff changeset	1461	buffer[k++] = b;
90ce3da70b43 Initial load duke parents: diff changeset	1462	}
90ce3da70b43 Initial load duke parents: diff changeset	1463	a = list[i++]; polarity ^= 1; // factored common code
90ce3da70b43 Initial load duke parents: diff changeset	1464	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1465	break;
90ce3da70b43 Initial load duke parents: diff changeset	1466	case 1: // a second, b first; if b < a, overlap
90ce3da70b43 Initial load duke parents: diff changeset	1467	if (a < b) { // no overlap, take a
90ce3da70b43 Initial load duke parents: diff changeset	1468	buffer[k++] = a; a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1469	} else if (b < a) { // OVERLAP, drop b
90ce3da70b43 Initial load duke parents: diff changeset	1470	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1471	} else { // a == b, drop both!
90ce3da70b43 Initial load duke parents: diff changeset	1472	if (a == HIGH) break main;
90ce3da70b43 Initial load duke parents: diff changeset	1473	a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1474	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1475	}
90ce3da70b43 Initial load duke parents: diff changeset	1476	break;
90ce3da70b43 Initial load duke parents: diff changeset	1477	case 2: // a first, b second; if a < b, overlap
90ce3da70b43 Initial load duke parents: diff changeset	1478	if (b < a) { // no overlap, take b
90ce3da70b43 Initial load duke parents: diff changeset	1479	buffer[k++] = b; b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1480	} else if (a < b) { // OVERLAP, drop a
90ce3da70b43 Initial load duke parents: diff changeset	1481	a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1482	} else { // a == b, drop both!
90ce3da70b43 Initial load duke parents: diff changeset	1483	if (a == HIGH) break main;
90ce3da70b43 Initial load duke parents: diff changeset	1484	a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1485	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1486	}
90ce3da70b43 Initial load duke parents: diff changeset	1487	break;
90ce3da70b43 Initial load duke parents: diff changeset	1488	}
90ce3da70b43 Initial load duke parents: diff changeset	1489	}
90ce3da70b43 Initial load duke parents: diff changeset	1490	buffer[k++] = HIGH; // terminate
90ce3da70b43 Initial load duke parents: diff changeset	1491	len = k;
90ce3da70b43 Initial load duke parents: diff changeset	1492	// swap list and buffer
90ce3da70b43 Initial load duke parents: diff changeset	1493	int[] temp = list;
90ce3da70b43 Initial load duke parents: diff changeset	1494	list = buffer;
90ce3da70b43 Initial load duke parents: diff changeset	1495	buffer = temp;
90ce3da70b43 Initial load duke parents: diff changeset	1496	pat = null;
90ce3da70b43 Initial load duke parents: diff changeset	1497	return this;
90ce3da70b43 Initial load duke parents: diff changeset	1498	}
90ce3da70b43 Initial load duke parents: diff changeset	1499
90ce3da70b43 Initial load duke parents: diff changeset	1500	// polarity = 0 is normal: x intersect y
90ce3da70b43 Initial load duke parents: diff changeset	1501	// polarity = 2: x intersect ~y == set-minus
90ce3da70b43 Initial load duke parents: diff changeset	1502	// polarity = 1: ~x intersect y
90ce3da70b43 Initial load duke parents: diff changeset	1503	// polarity = 3: ~x intersect ~y
90ce3da70b43 Initial load duke parents: diff changeset	1504
90ce3da70b43 Initial load duke parents: diff changeset	1505	private UnicodeSet retain(int[] other, int otherLen, int polarity) {
90ce3da70b43 Initial load duke parents: diff changeset	1506	ensureBufferCapacity(len + otherLen);
90ce3da70b43 Initial load duke parents: diff changeset	1507	int i = 0, j = 0, k = 0;
90ce3da70b43 Initial load duke parents: diff changeset	1508	int a = list[i++];
90ce3da70b43 Initial load duke parents: diff changeset	1509	int b = other[j++];
90ce3da70b43 Initial load duke parents: diff changeset	1510	// change from xor is that we have to check overlapping pairs
90ce3da70b43 Initial load duke parents: diff changeset	1511	// polarity bit 1 means a is second, bit 2 means b is.
90ce3da70b43 Initial load duke parents: diff changeset	1512	main:
90ce3da70b43 Initial load duke parents: diff changeset	1513	while (true) {
90ce3da70b43 Initial load duke parents: diff changeset	1514	switch (polarity) {
90ce3da70b43 Initial load duke parents: diff changeset	1515	case 0: // both first; drop the smaller
90ce3da70b43 Initial load duke parents: diff changeset	1516	if (a < b) { // drop a
90ce3da70b43 Initial load duke parents: diff changeset	1517	a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1518	} else if (b < a) { // drop b
90ce3da70b43 Initial load duke parents: diff changeset	1519	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1520	} else { // a == b, take one, drop other
90ce3da70b43 Initial load duke parents: diff changeset	1521	if (a == HIGH) break main;
90ce3da70b43 Initial load duke parents: diff changeset	1522	buffer[k++] = a; a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1523	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1524	}
90ce3da70b43 Initial load duke parents: diff changeset	1525	break;
90ce3da70b43 Initial load duke parents: diff changeset	1526	case 3: // both second; take lower if unequal
90ce3da70b43 Initial load duke parents: diff changeset	1527	if (a < b) { // take a
90ce3da70b43 Initial load duke parents: diff changeset	1528	buffer[k++] = a; a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1529	} else if (b < a) { // take b
90ce3da70b43 Initial load duke parents: diff changeset	1530	buffer[k++] = b; b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1531	} else { // a == b, take one, drop other
90ce3da70b43 Initial load duke parents: diff changeset	1532	if (a == HIGH) break main;
90ce3da70b43 Initial load duke parents: diff changeset	1533	buffer[k++] = a; a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1534	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1535	}
90ce3da70b43 Initial load duke parents: diff changeset	1536	break;
90ce3da70b43 Initial load duke parents: diff changeset	1537	case 1: // a second, b first;
90ce3da70b43 Initial load duke parents: diff changeset	1538	if (a < b) { // NO OVERLAP, drop a
90ce3da70b43 Initial load duke parents: diff changeset	1539	a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1540	} else if (b < a) { // OVERLAP, take b
90ce3da70b43 Initial load duke parents: diff changeset	1541	buffer[k++] = b; b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1542	} else { // a == b, drop both!
90ce3da70b43 Initial load duke parents: diff changeset	1543	if (a == HIGH) break main;
90ce3da70b43 Initial load duke parents: diff changeset	1544	a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1545	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1546	}
90ce3da70b43 Initial load duke parents: diff changeset	1547	break;
90ce3da70b43 Initial load duke parents: diff changeset	1548	case 2: // a first, b second; if a < b, overlap
90ce3da70b43 Initial load duke parents: diff changeset	1549	if (b < a) { // no overlap, drop b
90ce3da70b43 Initial load duke parents: diff changeset	1550	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1551	} else if (a < b) { // OVERLAP, take a
90ce3da70b43 Initial load duke parents: diff changeset	1552	buffer[k++] = a; a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1553	} else { // a == b, drop both!
90ce3da70b43 Initial load duke parents: diff changeset	1554	if (a == HIGH) break main;
90ce3da70b43 Initial load duke parents: diff changeset	1555	a = list[i++]; polarity ^= 1;
90ce3da70b43 Initial load duke parents: diff changeset	1556	b = other[j++]; polarity ^= 2;
90ce3da70b43 Initial load duke parents: diff changeset	1557	}
90ce3da70b43 Initial load duke parents: diff changeset	1558	break;
90ce3da70b43 Initial load duke parents: diff changeset	1559	}
90ce3da70b43 Initial load duke parents: diff changeset	1560	}
90ce3da70b43 Initial load duke parents: diff changeset	1561	buffer[k++] = HIGH; // terminate
90ce3da70b43 Initial load duke parents: diff changeset	1562	len = k;
90ce3da70b43 Initial load duke parents: diff changeset	1563	// swap list and buffer
90ce3da70b43 Initial load duke parents: diff changeset	1564	int[] temp = list;
90ce3da70b43 Initial load duke parents: diff changeset	1565	list = buffer;
90ce3da70b43 Initial load duke parents: diff changeset	1566	buffer = temp;
90ce3da70b43 Initial load duke parents: diff changeset	1567	pat = null;
90ce3da70b43 Initial load duke parents: diff changeset	1568	return this;
90ce3da70b43 Initial load duke parents: diff changeset	1569	}
90ce3da70b43 Initial load duke parents: diff changeset	1570
90ce3da70b43 Initial load duke parents: diff changeset	1571	private static final int max(int a, int b) {
90ce3da70b43 Initial load duke parents: diff changeset	1572	return (a > b) ? a : b;
90ce3da70b43 Initial load duke parents: diff changeset	1573	}
90ce3da70b43 Initial load duke parents: diff changeset	1574
90ce3da70b43 Initial load duke parents: diff changeset	1575	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	1576	// Generic filter-based scanning code
90ce3da70b43 Initial load duke parents: diff changeset	1577	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	1578
90ce3da70b43 Initial load duke parents: diff changeset	1579	private static interface Filter {
90ce3da70b43 Initial load duke parents: diff changeset	1580	boolean contains(int codePoint);
90ce3da70b43 Initial load duke parents: diff changeset	1581	}
90ce3da70b43 Initial load duke parents: diff changeset	1582
90ce3da70b43 Initial load duke parents: diff changeset	1583	// VersionInfo for unassigned characters
90ce3da70b43 Initial load duke parents: diff changeset	1584	static final VersionInfo NO_VERSION = VersionInfo.getInstance(0, 0, 0, 0);
90ce3da70b43 Initial load duke parents: diff changeset	1585
90ce3da70b43 Initial load duke parents: diff changeset	1586	private static class VersionFilter implements Filter {
90ce3da70b43 Initial load duke parents: diff changeset	1587	VersionInfo version;
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1588
2 90ce3da70b43 Initial load duke parents: diff changeset	1589	VersionFilter(VersionInfo version) { this.version = version; }
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1590
2 90ce3da70b43 Initial load duke parents: diff changeset	1591	public boolean contains(int ch) {
90ce3da70b43 Initial load duke parents: diff changeset	1592	VersionInfo v = UCharacter.getAge(ch);
90ce3da70b43 Initial load duke parents: diff changeset	1593	// Reference comparison ok; VersionInfo caches and reuses
90ce3da70b43 Initial load duke parents: diff changeset	1594	// unique objects.
90ce3da70b43 Initial load duke parents: diff changeset	1595	return v != NO_VERSION &&
90ce3da70b43 Initial load duke parents: diff changeset	1596	v.compareTo(version) <= 0;
90ce3da70b43 Initial load duke parents: diff changeset	1597	}
90ce3da70b43 Initial load duke parents: diff changeset	1598	}
90ce3da70b43 Initial load duke parents: diff changeset	1599
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1600	private static synchronized UnicodeSet getInclusions(int src) {
2 90ce3da70b43 Initial load duke parents: diff changeset	1601	if (INCLUSIONS == null) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1602	INCLUSIONS = new UnicodeSet[UCharacterProperty.SRC_COUNT];
2 90ce3da70b43 Initial load duke parents: diff changeset	1603	}
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1604	if(INCLUSIONS[src] == null) {
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1605	UnicodeSet incl = new UnicodeSet();
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1606	switch(src) {
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1607	case UCharacterProperty.SRC_PROPSVEC:
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1608	UCharacterProperty.getInstance().upropsvec_addPropertyStarts(incl);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1609	break;
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1610	default:
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1611	throw new IllegalStateException("UnicodeSet.getInclusions(unknown src "+src+")");
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1612	}
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1613	INCLUSIONS[src] = incl;
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1614	}
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1615	return INCLUSIONS[src];
2 90ce3da70b43 Initial load duke parents: diff changeset	1616	}
90ce3da70b43 Initial load duke parents: diff changeset	1617
90ce3da70b43 Initial load duke parents: diff changeset	1618	/**
90ce3da70b43 Initial load duke parents: diff changeset	1619	* Generic filter-based scanning code for UCD property UnicodeSets.
90ce3da70b43 Initial load duke parents: diff changeset	1620	*/
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1621	private UnicodeSet applyFilter(Filter filter, int src) {
2 90ce3da70b43 Initial load duke parents: diff changeset	1622	// Walk through all Unicode characters, noting the start
90ce3da70b43 Initial load duke parents: diff changeset	1623	// and end of each range for which filter.contain(c) is
90ce3da70b43 Initial load duke parents: diff changeset	1624	// true. Add each range to a set.
90ce3da70b43 Initial load duke parents: diff changeset	1625	//
90ce3da70b43 Initial load duke parents: diff changeset	1626	// To improve performance, use the INCLUSIONS set, which
90ce3da70b43 Initial load duke parents: diff changeset	1627	// encodes information about character ranges that are known
90ce3da70b43 Initial load duke parents: diff changeset	1628	// to have identical properties, such as the CJK Ideographs
90ce3da70b43 Initial load duke parents: diff changeset	1629	// from U+4E00 to U+9FA5. INCLUSIONS contains all characters
90ce3da70b43 Initial load duke parents: diff changeset	1630	// except the first characters of such ranges.
90ce3da70b43 Initial load duke parents: diff changeset	1631	//
90ce3da70b43 Initial load duke parents: diff changeset	1632	// TODO Where possible, instead of scanning over code points,
90ce3da70b43 Initial load duke parents: diff changeset	1633	// use internal property data to initialize UnicodeSets for
90ce3da70b43 Initial load duke parents: diff changeset	1634	// those properties. Scanning code points is slow.
90ce3da70b43 Initial load duke parents: diff changeset	1635
90ce3da70b43 Initial load duke parents: diff changeset	1636	clear();
90ce3da70b43 Initial load duke parents: diff changeset	1637
90ce3da70b43 Initial load duke parents: diff changeset	1638	int startHasProperty = -1;
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1639	UnicodeSet inclusions = getInclusions(src);
2 90ce3da70b43 Initial load duke parents: diff changeset	1640	int limitRange = inclusions.getRangeCount();
90ce3da70b43 Initial load duke parents: diff changeset	1641
90ce3da70b43 Initial load duke parents: diff changeset	1642	for (int j=0; j<limitRange; ++j) {
90ce3da70b43 Initial load duke parents: diff changeset	1643	// get current range
90ce3da70b43 Initial load duke parents: diff changeset	1644	int start = inclusions.getRangeStart(j);
90ce3da70b43 Initial load duke parents: diff changeset	1645	int end = inclusions.getRangeEnd(j);
90ce3da70b43 Initial load duke parents: diff changeset	1646
90ce3da70b43 Initial load duke parents: diff changeset	1647	// for all the code points in the range, process
90ce3da70b43 Initial load duke parents: diff changeset	1648	for (int ch = start; ch <= end; ++ch) {
90ce3da70b43 Initial load duke parents: diff changeset	1649	// only add to the unicodeset on inflection points --
90ce3da70b43 Initial load duke parents: diff changeset	1650	// where the hasProperty value changes to false
90ce3da70b43 Initial load duke parents: diff changeset	1651	if (filter.contains(ch)) {
90ce3da70b43 Initial load duke parents: diff changeset	1652	if (startHasProperty < 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1653	startHasProperty = ch;
90ce3da70b43 Initial load duke parents: diff changeset	1654	}
90ce3da70b43 Initial load duke parents: diff changeset	1655	} else if (startHasProperty >= 0) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1656	add_unchecked(startHasProperty, ch-1);
2 90ce3da70b43 Initial load duke parents: diff changeset	1657	startHasProperty = -1;
90ce3da70b43 Initial load duke parents: diff changeset	1658	}
90ce3da70b43 Initial load duke parents: diff changeset	1659	}
90ce3da70b43 Initial load duke parents: diff changeset	1660	}
90ce3da70b43 Initial load duke parents: diff changeset	1661	if (startHasProperty >= 0) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1662	add_unchecked(startHasProperty, 0x10FFFF);
2 90ce3da70b43 Initial load duke parents: diff changeset	1663	}
90ce3da70b43 Initial load duke parents: diff changeset	1664
90ce3da70b43 Initial load duke parents: diff changeset	1665	return this;
90ce3da70b43 Initial load duke parents: diff changeset	1666	}
90ce3da70b43 Initial load duke parents: diff changeset	1667
90ce3da70b43 Initial load duke parents: diff changeset	1668	/**
90ce3da70b43 Initial load duke parents: diff changeset	1669	* Remove leading and trailing rule white space and compress
90ce3da70b43 Initial load duke parents: diff changeset	1670	* internal rule white space to a single space character.
90ce3da70b43 Initial load duke parents: diff changeset	1671	*
90ce3da70b43 Initial load duke parents: diff changeset	1672	* @see UCharacterProperty#isRuleWhiteSpace
90ce3da70b43 Initial load duke parents: diff changeset	1673	*/
90ce3da70b43 Initial load duke parents: diff changeset	1674	private static String mungeCharName(String source) {
90ce3da70b43 Initial load duke parents: diff changeset	1675	StringBuffer buf = new StringBuffer();
90ce3da70b43 Initial load duke parents: diff changeset	1676	for (int i=0; i<source.length(); ) {
90ce3da70b43 Initial load duke parents: diff changeset	1677	int ch = UTF16.charAt(source, i);
90ce3da70b43 Initial load duke parents: diff changeset	1678	i += UTF16.getCharCount(ch);
90ce3da70b43 Initial load duke parents: diff changeset	1679	if (UCharacterProperty.isRuleWhiteSpace(ch)) {
90ce3da70b43 Initial load duke parents: diff changeset	1680	if (buf.length() == 0 \|\|
90ce3da70b43 Initial load duke parents: diff changeset	1681	buf.charAt(buf.length() - 1) == ' ') {
90ce3da70b43 Initial load duke parents: diff changeset	1682	continue;
90ce3da70b43 Initial load duke parents: diff changeset	1683	}
90ce3da70b43 Initial load duke parents: diff changeset	1684	ch = ' '; // convert to ' '
90ce3da70b43 Initial load duke parents: diff changeset	1685	}
90ce3da70b43 Initial load duke parents: diff changeset	1686	UTF16.append(buf, ch);
90ce3da70b43 Initial load duke parents: diff changeset	1687	}
90ce3da70b43 Initial load duke parents: diff changeset	1688	if (buf.length() != 0 &&
90ce3da70b43 Initial load duke parents: diff changeset	1689	buf.charAt(buf.length() - 1) == ' ') {
90ce3da70b43 Initial load duke parents: diff changeset	1690	buf.setLength(buf.length() - 1);
90ce3da70b43 Initial load duke parents: diff changeset	1691	}
90ce3da70b43 Initial load duke parents: diff changeset	1692	return buf.toString();
90ce3da70b43 Initial load duke parents: diff changeset	1693	}
90ce3da70b43 Initial load duke parents: diff changeset	1694
90ce3da70b43 Initial load duke parents: diff changeset	1695	/**
90ce3da70b43 Initial load duke parents: diff changeset	1696	* Modifies this set to contain those code points which have the
90ce3da70b43 Initial load duke parents: diff changeset	1697	* given value for the given property. Prior contents of this
90ce3da70b43 Initial load duke parents: diff changeset	1698	* set are lost.
90ce3da70b43 Initial load duke parents: diff changeset	1699	* @param propertyAlias
90ce3da70b43 Initial load duke parents: diff changeset	1700	* @param valueAlias
90ce3da70b43 Initial load duke parents: diff changeset	1701	* @param symbols if not null, then symbols are first called to see if a property
90ce3da70b43 Initial load duke parents: diff changeset	1702	* is available. If true, then everything else is skipped.
90ce3da70b43 Initial load duke parents: diff changeset	1703	* @return this set
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1704	* @stable ICU 3.2
2 90ce3da70b43 Initial load duke parents: diff changeset	1705	*/
90ce3da70b43 Initial load duke parents: diff changeset	1706	public UnicodeSet applyPropertyAlias(String propertyAlias,
90ce3da70b43 Initial load duke parents: diff changeset	1707	String valueAlias, SymbolTable symbols) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1708	if (valueAlias.length() > 0) {
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1709	if (propertyAlias.equals("Age")) {
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1710	// Must munge name, since
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1711	// VersionInfo.getInstance() does not do
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1712	// 'loose' matching.
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1713	VersionInfo version = VersionInfo.getInstance(mungeCharName(valueAlias));
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1714	applyFilter(new VersionFilter(version), UCharacterProperty.SRC_PROPSVEC);
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1715	return this;
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1716	}
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1717	}
903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1718	throw new IllegalArgumentException("Unsupported property: " + propertyAlias);
2 90ce3da70b43 Initial load duke parents: diff changeset	1719	}
90ce3da70b43 Initial load duke parents: diff changeset	1720
90ce3da70b43 Initial load duke parents: diff changeset	1721	/**
90ce3da70b43 Initial load duke parents: diff changeset	1722	* Return true if the given iterator appears to point at a
90ce3da70b43 Initial load duke parents: diff changeset	1723	* property pattern. Regardless of the result, return with the
90ce3da70b43 Initial load duke parents: diff changeset	1724	* iterator unchanged.
90ce3da70b43 Initial load duke parents: diff changeset	1725	* @param chars iterator over the pattern characters. Upon return
90ce3da70b43 Initial load duke parents: diff changeset	1726	* it will be unchanged.
90ce3da70b43 Initial load duke parents: diff changeset	1727	* @param iterOpts RuleCharacterIterator options
90ce3da70b43 Initial load duke parents: diff changeset	1728	*/
90ce3da70b43 Initial load duke parents: diff changeset	1729	private static boolean resemblesPropertyPattern(RuleCharacterIterator chars,
90ce3da70b43 Initial load duke parents: diff changeset	1730	int iterOpts) {
90ce3da70b43 Initial load duke parents: diff changeset	1731	boolean result = false;
90ce3da70b43 Initial load duke parents: diff changeset	1732	iterOpts &= ~RuleCharacterIterator.PARSE_ESCAPES;
90ce3da70b43 Initial load duke parents: diff changeset	1733	Object pos = chars.getPos(null);
90ce3da70b43 Initial load duke parents: diff changeset	1734	int c = chars.next(iterOpts);
90ce3da70b43 Initial load duke parents: diff changeset	1735	if (c == '[' \|\| c == '\\') {
90ce3da70b43 Initial load duke parents: diff changeset	1736	int d = chars.next(iterOpts & ~RuleCharacterIterator.SKIP_WHITESPACE);
90ce3da70b43 Initial load duke parents: diff changeset	1737	result = (c == '[') ? (d == ':') :
90ce3da70b43 Initial load duke parents: diff changeset	1738	(d == 'N' \|\| d == 'p' \|\| d == 'P');
90ce3da70b43 Initial load duke parents: diff changeset	1739	}
90ce3da70b43 Initial load duke parents: diff changeset	1740	chars.setPos(pos);
90ce3da70b43 Initial load duke parents: diff changeset	1741	return result;
90ce3da70b43 Initial load duke parents: diff changeset	1742	}
90ce3da70b43 Initial load duke parents: diff changeset	1743
90ce3da70b43 Initial load duke parents: diff changeset	1744	/**
90ce3da70b43 Initial load duke parents: diff changeset	1745	* Parse the given property pattern at the given parse position.
90ce3da70b43 Initial load duke parents: diff changeset	1746	* @param symbols TODO
90ce3da70b43 Initial load duke parents: diff changeset	1747	*/
90ce3da70b43 Initial load duke parents: diff changeset	1748	private UnicodeSet applyPropertyPattern(String pattern, ParsePosition ppos, SymbolTable symbols) {
90ce3da70b43 Initial load duke parents: diff changeset	1749	int pos = ppos.getIndex();
90ce3da70b43 Initial load duke parents: diff changeset	1750
90ce3da70b43 Initial load duke parents: diff changeset	1751	// On entry, ppos should point to one of the following locations:
90ce3da70b43 Initial load duke parents: diff changeset	1752
90ce3da70b43 Initial load duke parents: diff changeset	1753	// Minimum length is 5 characters, e.g. \p{L}
90ce3da70b43 Initial load duke parents: diff changeset	1754	if ((pos+5) > pattern.length()) {
90ce3da70b43 Initial load duke parents: diff changeset	1755	return null;
90ce3da70b43 Initial load duke parents: diff changeset	1756	}
90ce3da70b43 Initial load duke parents: diff changeset	1757
90ce3da70b43 Initial load duke parents: diff changeset	1758	boolean posix = false; // true for [:pat:], false for \p{pat} \P{pat} \N{pat}
90ce3da70b43 Initial load duke parents: diff changeset	1759	boolean isName = false; // true for \N{pat}, o/w false
90ce3da70b43 Initial load duke parents: diff changeset	1760	boolean invert = false;
90ce3da70b43 Initial load duke parents: diff changeset	1761
90ce3da70b43 Initial load duke parents: diff changeset	1762	// Look for an opening [:, [:^, \p, or \P
90ce3da70b43 Initial load duke parents: diff changeset	1763	if (pattern.regionMatches(pos, "[:", 0, 2)) {
90ce3da70b43 Initial load duke parents: diff changeset	1764	posix = true;
90ce3da70b43 Initial load duke parents: diff changeset	1765	pos = Utility.skipWhitespace(pattern, pos+2);
90ce3da70b43 Initial load duke parents: diff changeset	1766	if (pos < pattern.length() && pattern.charAt(pos) == '^') {
90ce3da70b43 Initial load duke parents: diff changeset	1767	++pos;
90ce3da70b43 Initial load duke parents: diff changeset	1768	invert = true;
90ce3da70b43 Initial load duke parents: diff changeset	1769	}
90ce3da70b43 Initial load duke parents: diff changeset	1770	} else if (pattern.regionMatches(true, pos, "\\p", 0, 2) \|\|
90ce3da70b43 Initial load duke parents: diff changeset	1771	pattern.regionMatches(pos, "\\N", 0, 2)) {
90ce3da70b43 Initial load duke parents: diff changeset	1772	char c = pattern.charAt(pos+1);
90ce3da70b43 Initial load duke parents: diff changeset	1773	invert = (c == 'P');
90ce3da70b43 Initial load duke parents: diff changeset	1774	isName = (c == 'N');
90ce3da70b43 Initial load duke parents: diff changeset	1775	pos = Utility.skipWhitespace(pattern, pos+2);
90ce3da70b43 Initial load duke parents: diff changeset	1776	if (pos == pattern.length() \|\| pattern.charAt(pos++) != '{') {
90ce3da70b43 Initial load duke parents: diff changeset	1777	// Syntax error; "\p" or "\P" not followed by "{"
90ce3da70b43 Initial load duke parents: diff changeset	1778	return null;
90ce3da70b43 Initial load duke parents: diff changeset	1779	}
90ce3da70b43 Initial load duke parents: diff changeset	1780	} else {
90ce3da70b43 Initial load duke parents: diff changeset	1781	// Open delimiter not seen
90ce3da70b43 Initial load duke parents: diff changeset	1782	return null;
90ce3da70b43 Initial load duke parents: diff changeset	1783	}
90ce3da70b43 Initial load duke parents: diff changeset	1784
90ce3da70b43 Initial load duke parents: diff changeset	1785	// Look for the matching close delimiter, either :] or }
90ce3da70b43 Initial load duke parents: diff changeset	1786	int close = pattern.indexOf(posix ? ":]" : "}", pos);
90ce3da70b43 Initial load duke parents: diff changeset	1787	if (close < 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1788	// Syntax error; close delimiter missing
90ce3da70b43 Initial load duke parents: diff changeset	1789	return null;
90ce3da70b43 Initial load duke parents: diff changeset	1790	}
90ce3da70b43 Initial load duke parents: diff changeset	1791
90ce3da70b43 Initial load duke parents: diff changeset	1792	// Look for an '=' sign. If this is present, we will parse a
90ce3da70b43 Initial load duke parents: diff changeset	1793	// medium \p{gc=Cf} or long \p{GeneralCategory=Format}
90ce3da70b43 Initial load duke parents: diff changeset	1794	// pattern.
90ce3da70b43 Initial load duke parents: diff changeset	1795	int equals = pattern.indexOf('=', pos);
90ce3da70b43 Initial load duke parents: diff changeset	1796	String propName, valueName;
90ce3da70b43 Initial load duke parents: diff changeset	1797	if (equals >= 0 && equals < close && !isName) {
90ce3da70b43 Initial load duke parents: diff changeset	1798	// Equals seen; parse medium/long pattern
90ce3da70b43 Initial load duke parents: diff changeset	1799	propName = pattern.substring(pos, equals);
90ce3da70b43 Initial load duke parents: diff changeset	1800	valueName = pattern.substring(equals+1, close);
90ce3da70b43 Initial load duke parents: diff changeset	1801	}
90ce3da70b43 Initial load duke parents: diff changeset	1802
90ce3da70b43 Initial load duke parents: diff changeset	1803	else {
90ce3da70b43 Initial load duke parents: diff changeset	1804	// Handle case where no '=' is seen, and \N{}
90ce3da70b43 Initial load duke parents: diff changeset	1805	propName = pattern.substring(pos, close);
90ce3da70b43 Initial load duke parents: diff changeset	1806	valueName = "";
90ce3da70b43 Initial load duke parents: diff changeset	1807
90ce3da70b43 Initial load duke parents: diff changeset	1808	// Handle \N{name}
90ce3da70b43 Initial load duke parents: diff changeset	1809	if (isName) {
90ce3da70b43 Initial load duke parents: diff changeset	1810	// This is a little inefficient since it means we have to
90ce3da70b43 Initial load duke parents: diff changeset	1811	// parse "na" back to UProperty.NAME even though we already
90ce3da70b43 Initial load duke parents: diff changeset	1812	// know it's UProperty.NAME. If we refactor the API to
90ce3da70b43 Initial load duke parents: diff changeset	1813	// support args of (int, String) then we can remove
90ce3da70b43 Initial load duke parents: diff changeset	1814	// "na" and make this a little more efficient.
90ce3da70b43 Initial load duke parents: diff changeset	1815	valueName = propName;
90ce3da70b43 Initial load duke parents: diff changeset	1816	propName = "na";
90ce3da70b43 Initial load duke parents: diff changeset	1817	}
90ce3da70b43 Initial load duke parents: diff changeset	1818	}
90ce3da70b43 Initial load duke parents: diff changeset	1819
90ce3da70b43 Initial load duke parents: diff changeset	1820	applyPropertyAlias(propName, valueName, symbols);
90ce3da70b43 Initial load duke parents: diff changeset	1821
90ce3da70b43 Initial load duke parents: diff changeset	1822	if (invert) {
90ce3da70b43 Initial load duke parents: diff changeset	1823	complement();
90ce3da70b43 Initial load duke parents: diff changeset	1824	}
90ce3da70b43 Initial load duke parents: diff changeset	1825
90ce3da70b43 Initial load duke parents: diff changeset	1826	// Move to the limit position after the close delimiter
90ce3da70b43 Initial load duke parents: diff changeset	1827	ppos.setIndex(close + (posix ? 2 : 1));
90ce3da70b43 Initial load duke parents: diff changeset	1828
90ce3da70b43 Initial load duke parents: diff changeset	1829	return this;
90ce3da70b43 Initial load duke parents: diff changeset	1830	}
90ce3da70b43 Initial load duke parents: diff changeset	1831
90ce3da70b43 Initial load duke parents: diff changeset	1832	/**
90ce3da70b43 Initial load duke parents: diff changeset	1833	* Parse a property pattern.
90ce3da70b43 Initial load duke parents: diff changeset	1834	* @param chars iterator over the pattern characters. Upon return
90ce3da70b43 Initial load duke parents: diff changeset	1835	* it will be advanced to the first character after the parsed
90ce3da70b43 Initial load duke parents: diff changeset	1836	* pattern, or the end of the iteration if all characters are
90ce3da70b43 Initial load duke parents: diff changeset	1837	* parsed.
90ce3da70b43 Initial load duke parents: diff changeset	1838	* @param rebuiltPat the pattern that was parsed, rebuilt or
90ce3da70b43 Initial load duke parents: diff changeset	1839	* copied from the input pattern, as appropriate.
90ce3da70b43 Initial load duke parents: diff changeset	1840	* @param symbols TODO
90ce3da70b43 Initial load duke parents: diff changeset	1841	*/
90ce3da70b43 Initial load duke parents: diff changeset	1842	private void applyPropertyPattern(RuleCharacterIterator chars,
90ce3da70b43 Initial load duke parents: diff changeset	1843	StringBuffer rebuiltPat, SymbolTable symbols) {
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1844	String patStr = chars.lookahead();
2 90ce3da70b43 Initial load duke parents: diff changeset	1845	ParsePosition pos = new ParsePosition(0);
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1846	applyPropertyPattern(patStr, pos, symbols);
2 90ce3da70b43 Initial load duke parents: diff changeset	1847	if (pos.getIndex() == 0) {
90ce3da70b43 Initial load duke parents: diff changeset	1848	syntaxError(chars, "Invalid property pattern");
90ce3da70b43 Initial load duke parents: diff changeset	1849	}
90ce3da70b43 Initial load duke parents: diff changeset	1850	chars.jumpahead(pos.getIndex());
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1851	rebuiltPat.append(patStr.substring(0, pos.getIndex()));
2 90ce3da70b43 Initial load duke parents: diff changeset	1852	}
90ce3da70b43 Initial load duke parents: diff changeset	1853
90ce3da70b43 Initial load duke parents: diff changeset	1854	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	1855	// Case folding API
90ce3da70b43 Initial load duke parents: diff changeset	1856	//----------------------------------------------------------------
90ce3da70b43 Initial load duke parents: diff changeset	1857
90ce3da70b43 Initial load duke parents: diff changeset	1858	/**
90ce3da70b43 Initial load duke parents: diff changeset	1859	* Bitmask for constructor and applyPattern() indicating that
90ce3da70b43 Initial load duke parents: diff changeset	1860	* white space should be ignored. If set, ignore characters for
90ce3da70b43 Initial load duke parents: diff changeset	1861	* which UCharacterProperty.isRuleWhiteSpace() returns true,
90ce3da70b43 Initial load duke parents: diff changeset	1862	* unless they are quoted or escaped. This may be ORed together
90ce3da70b43 Initial load duke parents: diff changeset	1863	* with other selectors.
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1864	* @stable ICU 3.8
2 90ce3da70b43 Initial load duke parents: diff changeset	1865	*/
90ce3da70b43 Initial load duke parents: diff changeset	1866	public static final int IGNORE_SPACE = 1;
90ce3da70b43 Initial load duke parents: diff changeset	1867
90ce3da70b43 Initial load duke parents: diff changeset	1868	}
2497 903fd9d785ef 6404304: RFE: Unicode 5.1 support peytoia parents: 2 diff changeset	1869

author	peytoia
	Tue, 06 Dec 2011 08:39:02 +0900
changeset 11136	f0f53bbe5bd1
parent 5506	202f599c92aa
child 14342	8435a30053c1
permissions	-rw-r--r--