From charlesreid1

(Created page with "=Intro= Regular expressions provide a way of searching for, and matching, text patterns. They allow for very fine control over pattern-matching, and are useful in many programs...")
 
Line 83: Line 83:


=Anchors=
=Anchors=
==Lines==
Line anchors include beginning-of-line:
<pre>
^
</pre>
and end-of-line:
<pre>
$
</pre>
==Words==


=Repetition Symbols=
=Repetition Symbols=


=Grouping Symbols=
=Grouping Symbols=

Revision as of 08:24, 28 April 2011

Intro

Regular expressions provide a way of searching for, and matching, text patterns. They allow for very fine control over pattern-matching, and are useful in many programs, like Awk, Sed, Vim, Grep, PHP, Perl, and other programming languages.

Regular expressions have several special groups of symbols:

  • Literal characters - these mean exactly what they say
  • Special characters - characters with special meaning in regular expressions (e.g. brackets and parentheses)
  • Character sets - these symbols represent a set of characters
  • Anchors - these match locations in a line, such as the beginning or end, or the boundary between words
  • Repetition symbols - these allow you to locate patterns with repetition in them
  • Grouping symbols - these provide a way to group terms


Literal Characters

Special Characters

Character Sets

In regular expressions, you can use expressions to match entire sets of characters, instead of individual characters. These are typically several (or a range of) characters surrounded by brackets.

Character sets can be inclusive (meaning, include a list of characters in the expression being searched for) or exclusive (meaning, excluding a list of characters from the expressions being searched for).

Inclusive Sets

Inclusive sets are straightforward: just put the characters you're searching for in between brackets. For example,

[aeiou]

will match any vowel. So searching for a pattern like:

gr[ae]y

will match gray or grey.

Exclusive Sets

If the

Character Set Special Characters

The only characters with special meaning in character sets are the backslash, the hyphen, the caret, and the close-bracket. These are usually escaped with a backslash.

Special characters don't always have to be escaped (this makes the regular expression more readable).

Backslash: the backslash always needs to be escaped

Caret: the caret can go unescaped anywhere EXCEPT right after the opening bracket

Closing bracket: the closing bracket can go right after the opening bracket or right after the negating caret. For example:

[^]a

matches any character that is not a closing bracket or an "a".

Hyphen: this can go right after the opening bracket or right before the closing bracket. For example:

[-abcdefg]
[abcdefg-]

These will both match the first 6 letters of the alphabet, or a hyphen.

Character Classes

Character classes can be used to match a pre-defined set of characters, e.g. all numbers or all non-whitespace characters:

\d - matches [0-9]
\w - matches [A-Za-z0-9_]
\s - matches [ \t\r\n]


Anchors

Lines

Line anchors include beginning-of-line:

^

and end-of-line:

$

Words

Repetition Symbols

Grouping Symbols