SoFunction
Updated on 2025-03-10

Improvement of PHP regular expression complete tutorial

In the previous article, I shared with you the basics of the PHP regular expression security tutorial. This article will continue to extend the knowledge about PHP regular expressions. Please see the details below.

Operation priority of various operators of php regular expressions

Operations with the same priority are performed from left to right, operations with different priority are higher first and lower. The priority of various operators is from high to low as follows:

Operator Description

Escape symbol

(), (?:), (?=), [] brackets and square brackets

*, +, ?, {n}, {n,}, {n,m} qualifiers

^, $, anymetacharacter Position and Order

| "OR" operation

All symbolic interpretations of php regular expressions.

Character Description

Mark the next character as a special character, or an primitive character, or a backward reference, or an octal escape character.

For example, 'n' matches the character "n". 'n' matches a newline character. The sequence '' matches "" and "(" matches "(".

^ Matches the start position of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after 'n' or 'r'.

$ Matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before 'n' or 'r'.

* Matches the previous subexpression zero or multiple times. For example, zo* can match "z" and "zoo". * is equivalent to {0,}.

+ Match the previous subexpression once or more times. For example, 'zo+' can match "zo" and "zoo", but not "z". + is equivalent to {1,}.

? Match the previous subexpression zero or once. For example, "do(es)?" can match "do" in "do" or "does". ? is equivalent to {0,1}.

{n} n is a non-negative integer. Match the n times that are determined. For example, 'o{2}' cannot match 'o' in "Bob", but can match two os in "food".

{n,} n is a non-negative integer. Match at least n times. For example, 'o{2,}' cannot match 'o' in "Bob" but can match all os in "fooooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'.

{n,m} m and n are non-negative integers, where n <= m. Match at least n times and match up to m times. For example, "o{1,3}" will match the first three os in "fooooooood". 'o{0,1}' is equivalent to 'o?'. Please note that there cannot be spaces between commas and two numbers.

? When the character is immediately followed by any other restriction character (*, +, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. The non-greedy pattern matches as few strings as possible, while the default greedy pattern matches as many strings as possible. For example, for the string "oooo", 'o+?' will match a single "o", and 'o+' will match all 'o'.

. Match any single character except "n". To match any character including 'n', use a pattern like '[.n]'.

(pattern) Match pattern and get this match. The obtained matches can be obtained from the generated Matches collection, using the SubMatches collection in VBScript, and the $0…$9 attribute in JScript. To match parentheses characters, use '(' or ')'.

(?:pattern) Match pattern but does not get the matching result, that is, this is a non-get match and is not stored for future use. This is useful when using the "or" character (|) to combine various parts of a pattern. For example, 'industr(?:y|ies) is a simpler expression than 'industry|industries' .

(?=pattern) Forward pre-check, matching the lookup string at the beginning of any string matching pattern. This is a non-get match, that is, the match does not need to be retrieved for later use. For example, 'Windows (?=95|98|NT|2000)' can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1". Pre-checking does not consume characters, that is, after a match occurs, the next match's search begins immediately after the last match, rather than after the characters containing the pre-checking.

(?!pattern) Negative pre-check, matching the lookup string at the beginning of any string that does not match the pattern. This is a non-get match, that is, the match does not need to be retrieved for later use. For example, 'Windows (?!95|98|NT|2000)' can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000". Pre-checking does not consume characters, that is, after a match occurs, the next match's search begins immediately after the last match, rather than after the characters containing the pre-checking.

x|y matches x or y. For example, 'z|food' can match "z" or "food". '(z|f)ood' matches "zood" or "food".

[xyz] Character collection. Match any character contained. For example, '[abc]' can match 'a' in "plain".

[^xyz] Collection of negative values ​​characters. Match any characters not included. For example, '[^abc]' can match 'p' in "plain".

[a-z] Character range. Match any character in the specified range. For example, '[a-z]' can match any lowercase alphabetical characters in the range 'a' to 'z'.

[^a-z] Negative value character range. Match any arbitrary characters that are not within the specified range. For example, '[^a-z]' can match any arbitrary character that is not in the range of 'a' to 'z'.

b Match a word boundary, which means the position between the word and space. For example, 'erb' can match 'er' in "never" but not 'er' in "verb".

B Match non-word boundaries. 'erB' can match 'er' in "verb", but cannot match 'er' in "never".

cx matches the control characters specified by x. For example, cM matches a Control-M or carriage return. The value of x must be one of A-Z or a-z. Otherwise, treat c as an original 'c' character.

d Matches a numeric character. Equivalent to [0-9].

D Matches a non-numeric character. Equivalent to [^0-9].

f matches a page break. Equivalent to x0c and cL.

n Matches a newline character. Equivalent to x0a and cJ.

r matches a carriage return character. Equivalent to x0d and cM.

s matches any whitespace characters, including spaces, tabs, page breaks, etc. Equivalent to [fnrtv].

S matches any non-whitespace characters. Equivalent to [^ fnrtv].

t matches a tab character. Equivalent to x09 and cI.

v Matches a vertical tab character. Equivalent to x0b and cK.

w Matches any word character that includes an underscore. Equivalent to '[A-Za-z0-9_]'.

W Matches any non-word character. Equivalent to '[^A-Za-z0-9_]'.

xn matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be the length of two numbers that are determined. For example, 'x41' matches "A". 'x041' is equivalent to 'x04' & "1". ASCII encoding can be used in regular expressions. .

num matches num, where num is a positive integer. Reference to the obtained match. For example, '(.)1' matches two consecutive identical characters.

n Identifies an octal escape value or a backward reference. If at least n obtained subexpressions before n, n is a backward reference. Otherwise, if n is an octal number (0-7), n is an octal escape value.

nm Identifies an octal escape value or a backward reference. If there are at least nm obtaining subexpressions before nm, nm is a backward reference. If there are at least n fetches before nm, n is a backward reference followed by the literal m. If none of the previous conditions are satisfied, if both n and m are octal numbers (0-7), nm will match the octal escape value nm.

nml If n is an octal number (0-3), and both m and l are octal numbers (0-7), the octal escape value nml is matched.

un matches n, where n is a Unicode character represented by four hexadecimal numbers. For example, u00A9 matches the copyright symbol (?).

The above content is the improvement of the complete tutorial on PHP regular expressions introduced to you. I hope you like it.