1 *java.util.regex.Pattern* *Pattern* A compiled representation of a regular expre
3 public final class Pattern
4 extends |java.lang.Object|
5 implements |java.io.Serializable|
7 |java.util.regex.Pattern_Description|
8 |java.util.regex.Pattern_Fields|
9 |java.util.regex.Pattern_Constructors|
10 |java.util.regex.Pattern_Methods|
12 ================================================================================
14 *java.util.regex.Pattern_Fields*
15 |int_java.util.regex.Pattern.CANON_EQ|
16 |int_java.util.regex.Pattern.CASE_INSENSITIVE|
17 |int_java.util.regex.Pattern.COMMENTS|
18 |int_java.util.regex.Pattern.DOTALL|
19 |int_java.util.regex.Pattern.LITERAL|
20 |int_java.util.regex.Pattern.MULTILINE|
21 |int_java.util.regex.Pattern.UNICODE_CASE|
22 |int_java.util.regex.Pattern.UNIX_LINES|
24 *java.util.regex.Pattern_Methods*
25 |java.util.regex.Pattern.compile(String)|Compiles the given regular expression
26 |java.util.regex.Pattern.compile(String,int)|Compiles the given regular express
27 |java.util.regex.Pattern.flags()|Returns this pattern's match flags.
28 |java.util.regex.Pattern.matcher(CharSequence)|Creates a matcher that will matc
29 |java.util.regex.Pattern.matches(String,CharSequence)|Compiles the given regula
30 |java.util.regex.Pattern.pattern()|Returns the regular expression from which th
31 |java.util.regex.Pattern.quote(String)|Returns a literal pattern String for the
32 |java.util.regex.Pattern.split(CharSequence)|Splits the given input sequence ar
33 |java.util.regex.Pattern.split(CharSequence,int)|Splits the given input sequenc
34 |java.util.regex.Pattern.toString()|Returns the string representation of this p
36 *java.util.regex.Pattern_Description*
38 A compiled representation of a regular expression.
40 A regular expression, specified as a string, must first be compiled into an
41 instance of this class. The resulting pattern can then be used to create a
42 (|java.util.regex.Matcher|) object that can match arbitrary </code>character
43 sequences<code>(|java.lang.CharSequence|) against the regular expression. All
44 of the state involved in performing a match resides in the matcher, so many
45 matchers can share the same pattern.
47 A typical invocation sequence is thus
51 Pattern p = Pattern. compile(|java.util.regex.Pattern|) ("a*b"); Matcher m = p.
52 matcher(|java.util.regex.Pattern|) ("aaaaab"); boolean b = m.
53 matches(|java.util.regex.Matcher|) ();
55 A matches(|java.util.regex.Pattern|) method is defined by this class as a
56 convenience for when a regular expression is used just once. This method
57 compiles an expression and matches an input sequence against it in a single
58 invocation. The statement
62 boolean b = Pattern.matches("a*b", "aaaaab");
64 is equivalent to the three statements above, though for repeated matches it is
65 less efficient since it does not allow the compiled pattern to be reused.
67 Instances of this class are immutable and are safe for use by multiple
68 concurrent threads. Instances of the (|java.util.regex.Matcher|) class are not
71 Summary of regular-expression constructs
79 x The character x \\ The backslash character \0n The character with octal value
80 0n (0<=n<=7) \0nn The character with octal value 0nn (0<=n<=7) \0mnn The
81 character with octal value 0mnn (0<=m<=3, 0<=n<=7) \xhh The character with
82 hexadecimalvalue0xhh uhhhh The character with hexadecimalvalue0xhhhh \t The tab
83 character ('u0009') \n The newline (line feed) character ('u000A') \r The
84 carriage-return character ('u000D') \f The form-feed character ('u000C') \a The
85 alert (bell) character ('u0007') \e The escape character ('u001B') \cx The
86 control character corresponding to x
90 [abc] a, b, or c (simple class) [^abc] Any character except a, b, or c
91 (negation) [a-zA-Z] a through z or A through Z, inclusive (range) [a-d[m-p]] a
92 through d, or m through p: [a-dm-p] (union) [a-z d, e, or f (intersection) [a-z
93 a through z, except for b and c: [ad-z] (subtraction) [a-z a through z, and not
94 m through p: [a-lq-z](subtraction)
96 Predefined character classes
98 . Any character (may or may not match line terminators) \d A digit: [0-9] \D A
99 non-digit: [^0-9] \s A whitespace character: [ \t\n\x0B\f\r] \S A
100 non-whitespace character: [^\s] \w A word character: [a-zA-Z_0-9] \W A non-word
103 POSIX character classes (US-ASCII only)
105 \p{Lower} A lower-case alphabetic character: [a-z] \p{Upper} An upper-case
106 alphabetic character:[A-Z] \p{ASCII} All ASCII:[\x00-\x7F] \p{Alpha} An
107 alphabetic character:[\p{Lower}\p{Upper}] \p{Digit} A decimal digit: [0-9]
108 \p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}] \p{Punct} Punctuation:
109 One of !"#$%?@[\]^_`{|}~ [\!"#\$%\\?@\[\\\]\^_`\{\|\}~]
110 [\X21-\X2F\X31-\X40\X5B-\X60\X7B-\X7E] --> \p{Graph} A visible character:
111 [\p{Alnum}\p{Punct}] \p{Print} A printable character: [\p{Graph}\x20] \p{Blank}
112 A space or a tab: [ \t] \p{Cntrl} A control character: [\x00-\x1F\x7F]
113 \p{XDigit} A hexadecimal digit: [0-9a-fA-F] \p{Space} A whitespace character: [
116 java.lang.Character classes (simple java character type)
118 \p{javaLowerCase} Equivalent to java.lang.Character.isLowerCase()
119 \p{javaUpperCase} Equivalent to java.lang.Character.isUpperCase()
120 \p{javaWhitespace} Equivalent to java.lang.Character.isWhitespace()
121 \p{javaMirrored} Equivalent to java.lang.Character.isMirrored()
123 Classes for Unicode blocks and categories
125 \p{InGreek} A character in the Greekblock (simple block) \p{Lu} An uppercase
126 letter (simple category) \p{Sc} A currency symbol \P{InGreek} Any character
127 except one in the Greek block (negation) [\p{L} Any letter except an uppercase
132 ^ The beginning of a line $ The end of a line \b A word boundary \B A non-word
133 boundary \A The beginning of the input \G The end of the previous match \Z The
134 end of the input but for the final terminator, ifany \z The end of the input
138 X? X, once or not at all X* X, zero or more times X+ X, one or more times X{n}
139 X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more
142 Reluctant quantifiers
144 X?? X, once or not at all X*? X, zero or more times X+? X, one or more times
145 X{n}? X, exactly n times X{n,}? X, at least n times X{n,m}? X, at least n but
146 not more than m times
148 Possessive quantifiers
150 X?+ X, once or not at all X*+ X, zero or more times X++ X, one or more times
151 X{n}+ X, exactly n times X{n,}+ X, at least n times X{n,m}+ X, at least n but
152 not more than m times
156 XY X followed by Y X|Y Either X or Y (X) X, as a capturing group
160 \n Whatever the nth capturing group matched
164 \ Nothing, but quotes the following character \Q Nothing, but quotes all
165 characters until \E \E Nothing, but ends quoting started by \Q ?[\]^{|} -->
167 Special constructs (non-capturing)
169 (?:X) X, as a non-capturing group (?idmsux-idmsux) Nothing, but turns match
170 flags i d m s u x on - off (?idmsux-idmsux:X) X, as a non-capturing group with
171 the given flags i d m s u x on - off (?=X) X, via zero-width positive lookahead
172 (?!X) X, via zero-width negative lookahead (?<=X) X, via zero-width positive
173 lookbehind (?<!X) X, via zero-width negative lookbehind (?>X) X, as an
174 independent, non-capturing group
180 Backslashes, escapes, and quoting
182 The backslash character ('\') serves to introduce escaped constructs, as
183 defined in the table above, as well as to quote characters that otherwise would
184 be interpreted as unescaped constructs. Thus the expression \\ matches a single
185 backslash and \{ matches a left brace.
187 It is an error to use a backslash prior to any alphabetic character that does
188 not denote an escaped construct; these are reserved for future extensions to
189 the regular-expression language. A backslash may be used prior to a
190 non-alphabetic character regardless of whether that character is part of an
193 Backslashes within string literals in Java source code are interpreted as
194 required by the Java Language Specification as either Unicode escapes or other
195 character escapes. It is therefore necessary to double backslashes in string
196 literals that represent regular expressions to protect them from interpretation
197 by the Java bytecode compiler. The string literal "b", for example, matches a
198 single backspace character when interpreted as a regular expression, while "b"
199 matches a word boundary. The string literal "(hello)" is illegal and leads to a
200 compile-time error; in order to match the string (hello) the string literal
201 "(hello)" must be used.
205 Character classes may appear within other character classes, and may be
206 composed by the union operator (implicit) and the intersection operator ( and
207 and ). The union operator denotes a class that contains every character that is
208 in at least one of its operand classes. The intersection operator denotes a
209 class that contains every character that is in both of its operand classes.
211 The precedence of character-class operators is as follows, from highest to
214 1 Literal escape \x 2 Grouping [...] 3 Range a-z 4 Union [a-e][i-u] 5
217 Note that a different set of metacharacters are in effect inside a character
218 class than outside a character class. For instance, the regular expression .
219 loses its special meaning inside a character class, while the expression -
220 becomes a range forming metacharacter.
224 A line terminator is a one- or two-character sequence that marks the end of a
225 line of the input character sequence. The following are recognized as line
230 A newline (line feed) character('\n'),
232 A carriage-return character followed immediately by a newline
235 A standalone carriage-return character('\r'),
237 A next-line character('u0085'),
239 A line-separator character('u2028'), or
241 A paragraph-separator character('u2029).
243 If (|java.util.regex.Pattern|) mode is activated, then the only line
244 terminators recognized are newline characters.
246 The regular expression . matches any character except a line terminator unless
247 the (|java.util.regex.Pattern|) flag is specified.
249 By default, the regular expressions ^ and $ ignore line terminators and only
250 match at the beginning and the end, respectively, of the entire input sequence.
251 If (|java.util.regex.Pattern|) mode is activated then ^ matches at the
252 beginning of input and after any line terminator except at the end of input.
253 When in (|java.util.regex.Pattern|) mode $ matches just before a line
254 terminator or the end of the input sequence.
258 Capturing groups are numbered by counting their opening parentheses from left
259 to right. In the expression ((A)(B(C))), for example, there are four such
262 1 ((A)(B(C))) 2 (A) 3 (B(C)) 4 (C)
264 Group zero always stands for the entire expression.
266 Capturing groups are so named because, during a match, each subsequence of the
267 input sequence that matches such a group is saved. The captured subsequence may
268 be used later in the expression, via a back reference, and may also be
269 retrieved from the matcher once the match operation is complete.
271 The captured input associated with a group is always the subsequence that the
272 group most recently matched. If a group is evaluated a second time because of
273 quantification then its previously-captured value, if any, will be retained if
274 the second evaluation fails. Matching the string "aba" against the expression
275 (a(b)?)+, for example, leaves group two set to "b". All captured input is
276 discarded at the beginning of each match.
278 Groups beginning with (? are pure, non-capturing groups that do not capture
279 text and do not count towards the group total.
283 This class is in conformance with Level 1 of Unicode Technical Standard #18:
284 Unicode Regular Expression Guidelines, plus RL2.1 Canonical Equivalents.
286 Unicode escape sequences such as u2014 in Java source code are processed as
287 described in ยค3.3 of the Java Language Specification. Such escape sequences are
288 also implemented directly by the regular-expression parser so that Unicode
289 escapes can be used in expressions that are read from files or from the
290 keyboard. Thus the strings "u2014" and "\\u2014", while not equal, compile into
291 the same pattern, which matches the character with hexadecimal value 0x2014.
293 Unicode blocks and categories are written with the \p and \P constructs as in
294 Perl. \p{prop} matches if the input has the property prop, while \P{prop} does
295 not match if the input has that property. Blocks are specified with the prefix
296 In, as in InMongolian. Categories may be specified with the optional prefix Is:
297 Both \p{L} and \p{IsL} denote the category of Unicode letters. Blocks and
298 categories can be used both inside and outside of a character class.
300 The supported categories are those of
302 The Unicode Standard in the version specified by the
303 Character(|java.lang.Character|) class. The category names are those defined in
304 the Standard, both normative and informative. The block names supported by
305 Pattern are the valid block names accepted and defined by
306 UnicodeBlock.forName(|java.lang.Character.UnicodeBlock|) .
308 Categories that behave like the java.lang.Character boolean ismethodname
309 methods (except for the deprecated ones) are available through the same
310 \p{prop} syntax where the specified property has the name javamethodname.
314 The Pattern engine performs traditional NFA-based matching with ordered
315 alternation as occurs in Perl 5.
317 Perl constructs not supported by this class:
321 The conditional constructs (?{X}) and (?(condition)X|Y),
323 The embedded code constructs (?{code}) and (??{code}),
325 The embedded comment syntax (?#comment), and
327 The preprocessing operations \l u, \L, and \U.
331 Constructs supported by this class but not by Perl:
335 Possessive quantifiers, which greedily match as much as they can and do not
336 back off, even when doing so would allow the overall match to succeed.
338 Character-class union and intersection as described above.
342 Notable differences from Perl:
346 In Perl, \1 through \9 are always interpreted as back references; a
347 backslash-escaped number greater than 9 is treated as a back reference if at
348 least that many subexpressions exist, otherwise it is interpreted, if possible,
349 as an octal escape. In this class octal escapes must always begin with a zero.
350 In this class, \1 through \9 are always interpreted as back references, and a
351 larger number is accepted as a back reference if at least that many
352 subexpressions exist at that point in the regular expression, otherwise the
353 parser will drop digits until the number is smaller or equal to the existing
354 number of groups or it is one digit.
356 Perl uses the g flag to request a match that resumes where the last match left
357 off. This functionality is provided implicitly by the
358 (|java.util.regex.Matcher|) class: Repeated invocations of the
359 find(|java.util.regex.Matcher|) method will resume where the last match left
360 off, unless the matcher is reset.
362 In Perl, embedded flags at the top level of an expression affect the whole
363 expression. In this class, embedded flags always take effect at the point at
364 which they appear, whether they are at the top level or within a group; in the
365 latter case, flags are restored at the end of the group just as in Perl.
367 Perl is forgiving about malformed matching constructs, as in the expression *a,
368 as well as dangling brackets, as in the expression abc], and treats them as
369 literals. This class also accepts dangling brackets but is strict about
370 dangling metacharacters like +, ? and *, and will throw a
371 (|java.util.regex.PatternSyntaxException|) if it encounters them.
375 For a more precise description of the behavior of regular expression
376 constructs, please see Mastering Regular Expressions, 3nd Edition, Jeffrey E.
377 F. Friedl, O'Reilly and Associates, 2006.
381 *int_java.util.regex.Pattern.CANON_EQ*
383 Enables canonical equivalence.
385 When this flag is specified then two characters will be considered to match if,
386 and only if, their full canonical decompositions match. The expression
387 "au030A", for example, will match the string "u00E5" when this flag is
388 specified. By default, matching does not take canonical equivalence into
391 There is no embedded flag character for enabling canonical equivalence.
393 Specifying this flag may impose a performance penalty.
396 *int_java.util.regex.Pattern.CASE_INSENSITIVE*
398 Enables case-insensitive matching.
400 By default, case-insensitive matching assumes that only characters in the
401 US-ASCII charset are being matched. Unicode-aware case-insensitive matching can
402 be enabled by specifying the (|java.util.regex.Pattern|) flag in conjunction
405 Case-insensitive matching can also be enabled via the embedded flag
408 Specifying this flag may impose a slight performance penalty.
411 *int_java.util.regex.Pattern.COMMENTS*
413 Permits whitespace and comments in pattern.
415 In this mode, whitespace is ignored, and embedded comments starting with # are
416 ignored until the end of a line.
418 Comments mode can also be enabled via the embedded flag expression(?x).
421 *int_java.util.regex.Pattern.DOTALL*
425 In dotall mode, the expression . matches any character, including a line
426 terminator. By default this expression does not match line terminators.
428 Dotall mode can also be enabled via the embedded flag expression(?s). (The s is
429 a mnemonic for "single-line" mode, which is what this is called in Perl.)
432 *int_java.util.regex.Pattern.LITERAL*
434 Enables literal parsing of the pattern.
436 When this flag is specified then the input string that specifies the pattern is
437 treated as a sequence of literal characters. Metacharacters or escape sequences
438 in the input sequence will be given no special meaning.
440 The flags CASE_INSENSITIVE and UNICODE_CASE retain their impact on matching
441 when used in conjunction with this flag. The other flags become superfluous.
443 There is no embedded flag character for enabling literal parsing.
446 *int_java.util.regex.Pattern.MULTILINE*
448 Enables multiline mode.
450 In multiline mode the expressions ^ and $ match just after or just before,
451 respectively, a line terminator or the end of the input sequence. By default
452 these expressions only match at the beginning and the end of the entire input
455 Multiline mode can also be enabled via the embedded flag expression(?m).
458 *int_java.util.regex.Pattern.UNICODE_CASE*
460 Enables Unicode-aware case folding.
462 When this flag is specified then case-insensitive matching, when enabled by the
463 (|java.util.regex.Pattern|) flag, is done in a manner consistent with the
464 Unicode Standard. By default, case-insensitive matching assumes that only
465 characters in the US-ASCII charset are being matched.
467 Unicode-aware case folding can also be enabled via the embedded flag
470 Specifying this flag may impose a performance penalty.
473 *int_java.util.regex.Pattern.UNIX_LINES*
475 Enables Unix lines mode.
477 In this mode, only the '\n' line terminator is recognized in the behavior of .,
480 Unix lines mode can also be enabled via the embedded flag expression(?d).
484 *java.util.regex.Pattern.compile(String)*
486 public static |java.util.regex.Pattern| compile(java.lang.String regex)
488 Compiles the given regular expression into a pattern.
491 regex - The expression to be compiled
493 *java.util.regex.Pattern.compile(String,int)*
495 public static |java.util.regex.Pattern| compile(
496 java.lang.String regex,
499 Compiles the given regular expression into a pattern with the given flags.
502 regex - The expression to be compiled
503 flags - Match flags, a bit mask that may include {@link #CASE_INSENSITIVE}, {@link
504 #MULTILINE}, {@link #DOTALL}, {@link #UNICODE_CASE}, {@link #CANON_EQ},
505 {@link #UNIX_LINES}, {@link #LITERAL} and {@link #COMMENTS}
507 *java.util.regex.Pattern.flags()*
511 Returns this pattern's match flags.
515 Returns: The match flags specified when this pattern was compiled
517 *java.util.regex.Pattern.matcher(CharSequence)*
519 public |java.util.regex.Matcher| matcher(java.lang.CharSequence input)
521 Creates a matcher that will match the given input against this pattern.
524 input - The character sequence to be matched
526 Returns: A new matcher for this pattern
528 *java.util.regex.Pattern.matches(String,CharSequence)*
530 public static boolean matches(
531 java.lang.String regex,
532 java.lang.CharSequence input)
534 Compiles the given regular expression and attempts to match the given input
537 An invocation of this convenience method of the form
541 Pattern.matches(regex, input);
543 behaves in exactly the same way as the expression
547 Pattern.compile(regex).matcher(input).matches()
549 If a pattern is to be used multiple times, compiling it once and reusing it
550 will be more efficient than invoking this method each time.
553 regex - The expression to be compiled
554 input - The character sequence to be matched
556 *java.util.regex.Pattern.pattern()*
558 public |java.lang.String| pattern()
560 Returns the regular expression from which this pattern was compiled.
564 Returns: The source of this pattern
566 *java.util.regex.Pattern.quote(String)*
568 public static |java.lang.String| quote(java.lang.String s)
570 Returns a literal pattern String for the specified String.
572 This method produces a String that can be used to create a Pattern that would
573 match the string s as if it were a literal pattern. Metacharacters or escape
574 sequences in the input sequence will be given no special meaning.
577 s - The string to be literalized
579 Returns: A literal string replacement
581 *java.util.regex.Pattern.split(CharSequence)*
583 public |java.lang.String|[] split(java.lang.CharSequence input)
585 Splits the given input sequence around matches of this pattern.
587 This method works as if by invoking the two-argument
588 split(|java.util.regex.Pattern|) method with the given input sequence and a
589 limit argument of zero. Trailing empty strings are therefore not included in
592 The input "boo:and:foo", for example, yields the following results with these
595 Regex Result : { "boo", "and", "foo" } o { "b", "", ":and:f" }
598 input - The character sequence to be split
600 Returns: The array of strings computed by splitting the input around matches of this
603 *java.util.regex.Pattern.split(CharSequence,int)*
605 public |java.lang.String|[] split(
606 java.lang.CharSequence input,
609 Splits the given input sequence around matches of this pattern.
611 The array returned by this method contains each substring of the input sequence
612 that is terminated by another subsequence that matches this pattern or is
613 terminated by the end of the input sequence. The substrings in the array are in
614 the order in which they occur in the input. If this pattern does not match any
615 subsequence of the input then the resulting array has just one element, namely
616 the input sequence in string form.
618 The limit parameter controls the number of times the pattern is applied and
619 therefore affects the length of the resulting array. If the limit n is greater
620 than zero then the pattern will be applied at most n-1 times, the array's
621 length will be no greater than n, and the array's last entry will contain all
622 input beyond the last matched delimiter. If n is non-positive then the pattern
623 will be applied as many times as possible and the array can have any length. If
624 n is zero then the pattern will be applied as many times as possible, the array
625 can have any length, and trailing empty strings will be discarded.
627 The input "boo:and:foo", for example, yields the following results with these
630 Regex Limit Result : 2 { "boo", "and:foo" } : 5 { "boo", "and", "foo" } : -2 {
631 "boo", "and", "foo" } o 5 { "b", "", ":and:f", "", "" } o -2 { "b", "",
632 ":and:f", "", "" } o 0 { "b", "", ":and:f" }
635 input - The character sequence to be split
636 limit - The result threshold, as described above
638 Returns: The array of strings computed by splitting the input around matches of this
641 *java.util.regex.Pattern.toString()*
643 public |java.lang.String| toString()
645 Returns the string representation of this pattern. This is the regular
646 expression from which this pattern was compiled.
650 Returns: The string representation of this pattern