2. Match operators =~、!~
=~ Check whether the match is successful: $result = $var =~ /abc/; If the pattern is found in the string, a non-zero value is returned, that is, true, and if it does not match, it returns 0, that is, false. !~The opposite is true.
These two operators are suitable for conditional control, such as:
if ($question =~ /please/) {
print ("Thank you for being polite!\n");
}
else {
print ("That was not very polite!\n");
}
3. Special characters in the pattern
PERL supports some special characters in the pattern, which can play some special roles.
1. Character +
+ means one or more identical characters, such as: /de+f/ refers to def, deef, deepeeeef, etc. It tries to match as many of the same characters as possible, such as /ab+/ in the string abbc will match abb, not ab.
When there are more than one space between words in a line, it can be divided as follows:
@array = split (/ +/, $line);
Note: Every time the split function encounters a split mode, a new word always starts. Therefore, if $line starts with a space, the first element of @array is an empty element. But it can distinguish whether there are really words. If there are only spaces in $line, @array is an empty array. And in the above example, the TAB character is treated as a word. Pay attention to corrections.
2. Characters [] and [^]
[] means matching one of a set of characters, such as /a[0123456789]c/ will match a string with numbers plus c. Use examples in combination with +: /d[eE]+f/Match def, dEf, deef, dEdf, dEEeeeEef, etc. ^ represents all characters except them, such as: /d[^deE]f/ matches strings with d plus non-e characters plus f.
3. Character * and?
They are similar to +, the difference is that * matches 0, 1 or more of the same characters, and 0 or 1 of the characters. For example, /de*f/ matches df, def, deepeef, etc.; /de?f/ matches df or def.
4. Escape characters
If you want to include characters that are usually regarded as special meanings in the pattern, you must prefix them with a slash "\". For example: /\*+/ in \* means character *, rather than the meaning of one or more characters mentioned above. The slash is represented as /\\/. In PERL5, character pairs \Q and \E can be escaped.
5. Match any letter or number
The above mentioned pattern /a[0123456789]c/ matches a string with a letter a plus arbitrary number plus c. Another way of representation is: /a[0-9]c/, similarly, [a-z] represents any lowercase letter, and [A-Z] represents any uppercase letter. The representation method of any upper and lower case letters and numbers is: /[0-9a-zA-Z]/.
6. Anchor mode
anchor | describe |
^or\A | Match only the string header |
$or\Z | Match only the end of the string |
\b | Match word boundaries |
\B | Internal word matching |
Example 1: /^def/ only matches strings headed with def, /$def/ only matches strings ended with def, and the combined /^def$/ only matches string def(?). \A and \Z are different from ^ and $ when multiple rows match.
Example 2: Check the type of variable name:
if ($varname =~ /^\$[A-Za-z][_0-9a-zA-Z]*$/) {
print ("$varname is a legal scalar variable\n");
} elsif ($varname =~ /^@[A-Za-z][_0-9a-zA-Z]*$/) {
print ("$varname is a legal array variable\n");
} elsif ($varname =~ /^[A-Za-z][_0-9a-zA-Z]*$/) {
print ("$varname is a legal file variable\n");
} else {
print ("I don't understand what $varname is.\n");
}
Example 3: \b matches words starting with def, etc. at word boundaries: /\bdef/ matches words starting with def, etc., but do not match abcdef. /def\b/ matches words ending with def, such as def and abcdef, but does not match defghi, /\bdef\b/ only matches the string def. Note: /\bdef/ can match $defghi, because $ is not considered a part of the word.
Example 4: \B matches within the word: /\Bdef/ matches abcdef, etc., but does not match def; /def\B/ matches defghi, etc.; /\Bdef\B/ matches cdefg, abcdefghi, etc., but does not match def, defghi, abcdef.
7. Variable replacement in pattern
Divide sentences into words:
$pattern = "[\\t ]+";
@words = split(/$pattern/, $line);
8. Character range escape
EEscape characters | describe | scope |
\d | Any number | [0-9] |
\D | Any character except numbers | [^0-9] |
\w | Any word character | [_0-9a-zA-Z] |
\W | Any non-word character | [^_0-9a-zA-Z] |
\s | blank | [ \r\t\n\f] |
\S | Non-blank | [^ \r\t\n\f] |
Example: /[\da-z]/Match any number or lowercase letters.
9. Match any character
The character "." matches all characters except line breaks and is usually used in combination with *.
10. Match the specified number of characters
Character pair {} specifies the number of occurrences of the matching character. For example: /de{1,3}f/ matches def, deef and deep; /de{3}f/ matches deeef; /de{3,}f/ matches no less than 3 e between d and f; /de{0,3}f/ matches no more than 3 e between d and f.
11. Specify options
The character "|" specifies two or more choices to match the pattern. For example: /def|ghi/matches def or ghi.
Example: Check the legality of numerical representation
if ($number =~ /^-?\d+$|^-?0[xX][\da-fa-F]+$/) {
print ("$number is a legal integer.\n");
} else {
print ("$number is not a legal integer.\n");
}
where ^-?\d+$ matches decimal numbers, ^-?0[xX][\da-fa-F]+$ matches hexadecimal numbers.
12. Partial reuse of the mode
When the same part in the pattern appears multiple times, it can be enclosed in brackets and referenced multiple times with \n to simplify the expression:
/\d{2}([\W])\d{2}\1\d{2}/ Match:
12-05-92
26.11.87
07 04 92 etc.
Note: /\d{2}([\W])\d{2}\1\d{2}/ is different from /(\d{2})([\W])\1\2\1/ , the latter only matches strings with shapes such as 17-17-17, but not 17-05-91, etc.
13. Escape and execution order of specific characters
Like operators, escapes and specific characters also have execution order:
Special characters | describe |
() | Mode memory |
+ * ? {} | Number of occurrences |
^ $ \b \B | anchor |
| | Options |
14. Specify the mode delimiter
By default, the mode delimiter is backslash /, but it can be specified by the letter m, such as:
m!/u/jqpublic/perl/prog1! is equivalent to /\/u\/jqpublic\/perl\/prog1/
Note: When using the letter ' as the delimiter, no variable replacement is made; when using special characters as the delimiter, its escape function or special function cannot be used.
15. Pattern order variables
After the pattern match, the variable $n can be used to call the reuse part of the result, and all the results are used to use the variable $&.
$string = "This string contains the number 25.11.";
$string =~ /-?(\d+)\.?(\d+)/; # The matching result is 25.11
$integerpart = $1; # now $integerpart = 25
$decimalpart = $2; # now $decimalpart = 11
$totalpart = $&; # now totalpart = 25.11
4. Pattern matching options
Options | describe |
g | Match all possible patterns |
i | Ignore case |
m | Treat strings as multiple lines |
o | Assign only once |
s | Treat strings as single lines |
x | Ignore blanks in mode |
1. Match all possible patterns (g option)
@matches = "balata" =~ /.a/g; # now @matches = ("ba", "la", "ta")
Matching loop:
while ("balata" =~ /.a/g) {
$match = $&;
print ("$match\n");
}
The result is:
ba
la
ta
When option g is used, the function pos can be used to control the offset of the next match:
$offset = pos($string);
pos($string) = $newoffset;
2. Ignore case (i option) example
/de/i matches de, dE, De and DE.
3. Treat strings as multiple lines (m option)
In this case, the ^ symbol matches the beginning of the string or the beginning of a new line; the $ symbol matches the end of any line.
4. Only perform variable replacement example once
$var = 1;
$line = <STDIN>;
while ($var < 10) {
$result = $line =~ /$var/o;
$line = <STDIN>;
$var++;
}
Match /1/ every time.
5. Think of strings as a single line example
/a.*bc/s matches the string axxxxx \nxxxxbc, but /a.*bc/ does not match the string.
6. Ignore spaces in mode
/\d{2} ([\W]) \d{2} \1 \d{2}/x is equivalent to /\d{2}([\W])\d{2}\1\d{2}/.
5. Replace operator
The syntax is s/pattern/replacement/, and its effect is to replace the part of the string that matches pattern with replacement. like:
$string = "abc123def";
$string =~ s/123/456/; # now $string = "abc456def";
In the replacement part, the pattern order variable $n can be used, such as s/(\d+)/[$1]/, but in the replacement part, special characters of the pattern are not supported, such as {}, *, +, etc., such as s/abc/[def]/, will replace abc with [def].
The options for replacing operators are as follows:
Options | describe |
g | Change all matches in the pattern |
i | Ignore case in mode |
e | Replace string as expression |
m | Treat the string to be matched as multiple lines |
o | Assign only once |
s | Treat the string to be matched as a single line |
x | Ignore blanks in mode |
Note: The e option treats the string of the replacement part as an expression, and calculates its value before the replacement, such as:
$string = "0abc1";
$string =~ s/[a-zA-Z]+/$& x 2/e; # now $string = "0abcabc1"
6. Translation operator
This is another alternative, syntax such as: tr/string1/string2/. Similarly, string2 is the replacement part, but the effect is to replace the first character in string1 with the first character in string2, replace the second character in string1 with the second character in string2, and so on. like:
$string = "abcdefghicba";
$string =~ tr/abc/def/; # now string = "defdefghifed"
When string1 is longer than string2, its extra characters are replaced by the last character of string2; when the same character in string1 appears multiple times, the first replacement character will be used.
The options for the translation operator are as follows:
Options | describe |
c | Translate all unspecified characters |
d | Delete all specified characters |
s | Condensing multiple identical output characters into one |
For example, $string =~ tr/\d/ /c; replace all non-numeric characters with spaces. $string =~ tr/\t //d; delete tab and space; $string =~ tr/0-9/ /cs; replace other characters between numbers with a space.
7. Extended pattern matching
PERL supports some pattern matching capabilities that are not available in PERL4 and standard UNIX pattern matching operations. Its syntax is: (?<c>pattern), where c is a character and pattern is the working pattern or sub-pattern.
1. Do not store matching content in brackets
In PERL mode, the sub-patterns in brackets will be stored in memory. This function cancels the storage of matching contents in brackets, such as \1 in /(?:a|b|c)(d|e)f\1/ means matching d or e, rather than a or b or c.
2. Embed mode options
Usually, the mode option is placed behind it, and there are four options: i, m, s, and x can be used inline, and the syntax is: /(?option)pattern/, which is equivalent to /pattern/option.
3. A positive and negative foresight matchThe positive predictive matching syntax is /pattern(?=string)/, which means matching the pattern followed by string. On the contrary, (?!string) means matching the pattern followed by non-string, such as:
$string = "25abc8";
$string =~ /abc(?=[0-9])/;
$matched = $&; # $& is the matched pattern, here is abc, not abc8
4. Pattern comments
In PERL5, you can use ?# to add comments in the mode, such as:
if ($string =~ /(?i)[a-z]{2,3}(?# match two or three alphabetic characters)/ {
...
}