Regular Expressions

It is suggested the reader uses thisappto get a clue of what we are talking about in this wiki until he learns to program it all by himself.

Regular Expressions (abbreviated regex) are the most useful tools instringprocessing. If you are fond of the搜索并替换您喜欢的文本编辑器/文字处理器中的工具，您会喜欢的。

xkcd xkcd

介绍

Regular Expressions was initially a term borrowed fromautomata theoryin theoretical computer science. Broadly, it refers topatterns需要与之匹配的子字符串。

The comic should have already given you an idea of what regular expressions could be useful for. It should not be surprising that many programming languages, text processing tools, data validation tools and search engines make extensive use of them.

The key idea is that a regular expression is a pattern whichmatchesa set of target strings.

\ w+@\w+\.(com|org|net|in)是符合最终电子邮件地址的正则态度.com,.net,.org或者a。在.

Concepts

随着语言的变化，有多种形式的正则语法。在这里，我们将检查Perl Regex，因为大多数其他正则是对此的变化。

在我们深入研究语法之前，这些是模式包含的事物：

Literals:They are the simplest things to match. When they are there, we just match them. It could be like ana或者a1.
metacharacters：They do not mean what they look like. They usually refer to something else. For example,\ dcould refer to any digit.
Vertical Bar:The|is a symbol of boolean OR. It gives an option to match any of the things it delimits.
Quantifiers:They specify how many of the concerned pattern needs to be matched.
Grouping and Capturing:括号可用于分组正则零件或捕获零件以备后用。

Syntax

让我们看一下Metacharacters的详细信息。

metacharacter	Description
`^`	弦的开始
`$`	End of a string
`\ t`	标签
`\ n`	Newline
`\ r`	Carriage Return
`\ s`	Any whitespace character
`\ s`	任何非空格角色
`\ d`	Any Digit
`\D`	Any non-digit
`\ w`	Any word-character
`\ w`	任何非字的角色
`\ b`	Any word boundary
`\B`	Any non-word-boundary
`.`	Any single character, usually barring a newline

By the way, if you want to match a metacharacter literally, you need to use\to escape it. For example,\.只会匹配.character.

现在，让我们研究更多的灵活性。

表达	Meaning
`[abc]`	匹配任何一个`a`,`b`, or`c`
`[^abc]`	Matches anything other than`a`,`b`, or`c`
`[a-d]`	匹配任何一个the characters in the range`广告`
`一个*`	Matches`a`zero or more times
`一个？`	Matches`a`zero or one time
`A+`	Matches`a`一次或多次
`a \| b`	Matches either`a`或者`b`
`a{3}`	Matches exactly 3 of`a`
`a{3,}`	Matches 3 or more of`a`
`a{3,5}`	Matches 3, 4 or 5 of`a`（包括范围）
`( )`	Captures everything inside the bracket

我们现在准备解释为什么\ w+@\w+\.(com|org|net|in)does what it claims.

Firstly, what should an email look like? That's right, it should have a structure like用户@domain.extension.

The用户和domainconsists of any letter, number or underscore but at least one of them. So, we use\ w+.

We restrict theextensionto或者g,com,网或者inby using the|.

Regular Expressions in Action - Perl Implementation

Perl is the language that is the most famous for its use of regular expression for good reasons.

We use the=〜operator to denote a match or an assignment depending upon the context. The use of!~is to reverse the sense of the match.

There are basically two regex operators in perl:

匹配：m//
Substitution:s///

The purpose of the//is to enclose the regex. However, any other delimiters like{},"", etc could be used.

匹配

To use the matching operator, we simply check both sides using the=〜和m//operator.

The following sets$ trueto 1 if and only if$foomatches the regular expressionfoo:
1
$ true=($foo=〜m/foo/);
并不难看到相反的情况是!~:
1
$false=($foo!~m/foo/);

捕获

As promised, the()could be used for capturing parts of the regexes. When the pattern inside a parentheses match, they go into special variables like$1,$2, etc in that order.

Here's how one would extract the hours, minutes, seconds from a time string:
1 2 3 4 5
if($time=〜/(\d\d):(\d\d):(\d\d)/){# match hh:mm:ss format$hours=$1;$minutes=$2;$seconds=$3;}

In list context, the list（$ 1，$ 2，$ 3，..）would be returned.

A simpler way to do the same would be
1
我的($hours,$minutes,$seconds)=($time=〜m/（\ d+）：（\ d+）：（\ d+）/);

Substitution

This is our favorite search and replace feature. Almost the same syntax rules apply here except that there is an extra clause between the second//这告诉我们要与什么相匹配。

Here is a self-explanatory piece of code:

1 2 3 4 5 6 78

$x="Time to feed the cat!";$x=〜s/cat/hacker/;# $x contains "Time to feed the hacker!"if($x=〜s/^(Time.*hacker)!$/$1 now!/){$ more_insistent=1;}$y="'quoted words'";$y=〜s/^'(.*)'$/$1/;# strip single quotes,# $y contains "quoted words"

修饰符s

修饰符s could be appended to the end of the regex operation expression to modify their matching behavior.

Here is a list of some important modifiers:

修饰符	Description
`i`	Case insensisitive matching
`s`	允许使用`.`to match newlines
`x`	为了清楚起见
`g`	Globally find all matches

Here's how one might want to use thegmodifier:

1 2 3 4 5 6 7

$x=“我打了4击4”;$x=〜S/4/四/GydF4y2Ba;# doesn't do it all:# $x contains "I batted four for 4"$x=“我打了4击4”;$x=〜S/4/四/G;# does it all:# $x contains "I batted four for four"

Quiz

与...相关

Contents

匹配

捕获

Substitution

修饰符s