Regex Intro

0

Category :

Testing

http://regexlib.com

Special Characters - metacharacters

11 characters with special meanings:
    the opening square bracket [,
    the backslash \,
    the caret ^,
    the dollar sign $,
    the period or dot ., The dot matches a single character, without caring what that character is. The only exception are newline characters.
    the vertical bar or pipe symbol |,
    the question mark ?, The question mark makes the preceding token in the regular expression optional. E.g.: "colou?r" matches both "colour" and "color".
    the asterisk or star *,
    the plus sign +, The + (plus) matches when the preceding character occurs 1 or more times, for example, tre+ will find tree (e is found 2 times) and tread (e is found 1 time)
    the opening round bracket (
    and the closing round bracket ).

If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash. If you want to match 1+1=2, the correct regex is 1\+1=2. Otherwise, the plus sign will have a special meaning.
Most regular expression flavors treat the brace { as a literal character, unless it is part of a repetition operator like {1,3}.
SymbolUseUsage
[,the opening square bracket
\the backlash
^
$
.The dot matches a single character, without caring what that character is. The only exception are newline characters.
|
?

Metacharacters Defined

MCharDefinitionPatternSample Matches
^Start of a string.^abcabc, abcdefg, abc123, ...
$End of a string.abc$abc, endsinabc, 123abc, ...
.Any character (except \n newline)a.cabc, aac, acc, adc, aec, ...
|Alternation.bill|tedted, bill
{...}Explicit quantifier notation.ab{2}cabbc
[...]Explicit set of characters to match.a[bB]cabc, aBc
(...)Logical grouping of part of an expression.(abc){2}abcabc
*0 or more of previous expression.ab*cac, abc, abbc, abbbc, ...
+1 or more of previous expression.ab+cabc, abbc, abbbc, ...
?0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.ab?cac, abc
\Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.a\sca c

Examples

\d will match a single digit from 0 to 9.

Optional Items

You can make several tokens optional by grouping them together using round brackets, and placing the question mark after the closing bracket. E.g.: Nov(ember)? will match Nov and November.

Important Regex Concept: Greediness

With the question mark, I have introduced the first metacharacter that is greedy. The question mark gives the regex engine two choices: try to match the part the question mark applies to, or do not try to match it. The engine will always try to match that part. Only if this causes the entire regular expression to fail, will the engine try ignoring the part the question mark applies to.
The effect is that if you apply the regex Feb 23(rd)? to the string Today is Feb 23rd, 2003, the match will always be Feb 23rd and not Feb 23. You can make the question mark lazy (i.e. turn off the greediness) by putting a second question mark after the first.

Character Classes or Character Sets

If you want to match an a OR an e, use [ae]. You could use this in gr[ae]y to match either gray or grey.
You can use a hyphen inside a character class to specify a range of characters. [0-9] matches a single digit between 0 and 9. You can use more than one range. [0-9a-fA-F] matches a single hexadecimal digit, case insensitively.
You can combine ranges and single characters. [0-9a-fxA-FX] matches a hexadecimal digit or the letter X.

Negated Character Classes

Typing a caret after the opening square bracket will negate the character class.
It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u".

Metacharacters Inside Character Classes

Note that the only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-). The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash. To search for a star or plus, use [+*]. Your regex will work fine.

Shorthand Character Classes

Since certain character classes are used often, a series of shorthand character classes are available.
\d is short for [0-9].
\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.
\s stands for "whitespace character". Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n]. That is: \s will match a space, a tab or a line break.

Non-Printable Characters

You can use special character sequences to put non-printable characters in your regular expression.
\t to match a tab character (ASCII 0x09),
\r for carriage return (0x0D) and
\n for line feed (0x0A).

Lookahead and Lookbehind

LookaroundNameWhat it Does
(?=foo)LookaheadAsserts that what immediately follows the current position in the string is foo
(?<=foo)LookbehindAsserts that what immediately precedes the current position in the string is foo
(?!foo)Negative LookaheadAsserts that what immediately follows the current position in the string is not foo
(?<!foo)Negative LookbehindAsserts that what immediately precedes the current position in the string is not foo


my thanks to: http://www.regular-expressions.info/engine.html
good examples: http://www.regular-expressions.info/reference.html
generator: http://www.txt2re.com/index.php3?s=VA-MU-G-E-10&6&-19&5
http://overapi.com/regex/ CheatSheet
http://qntm.org/files/re/re.html
Learn regular expressions in about 55 minutes
Learn Regular Expressions (RegEx) with Ease
http://www.rexegg.com/regex-lookarounds.html