Regular expression metacharacters and normal characters:
According to regular expression syntax rules, the matching pattern of regular expressions is composed of a series of characters.
1. Ordinary characters:
Most characters can only describe themselves, and these characters are called ordinary characters, such as all letters and numbers.
That is to say, ordinary characters can only match the same characters in the string as them.
Two.metacharacter:
Since ordinary characters can only match the same characters as themselves, the flexibility and powerful matching function of regular expressions cannot be fully displayed. Therefore, a series of special characters are also specified in the regular expression. These special characters are not matched according to the direct quantity of characters, but have special semantics.
For example, the following characters are as follows:
^ $ . * + ? = ! : | \ / ( ) [ ] { }
Although the above characters have special meanings, some characters only have special meanings in certain contexts.
If you want to match these characters with special meanings, you need to add a backslash (\) before these characters for escape. For example, if I want to match a $ direct quantity, I need to write it as \$, otherwise it is to match an ending position. It is precisely because of the existence of these special characters that regular expressions have powerful functions.
Because they are basic characters that construct various regular expressions that match complex text, they are called metacharacters.
The usage of metacharacters will be introduced in the following chapters, here is just an introduction to this concept. Regular expression language consists of two basic character types: primitive (normal) text characters and meta characters. Metacharacters make regular expressions processable. Metachars can be either any single character placed in [] (such as [a] means matching a single lowercase character a), or a sequence of characters (such as [a-d] means matching any character between a , b , c , and d, while \w means any English letters, numbers and underscores). Here are some common metacharacters:
. Match any character except \n (note that the metacharacter is a decimal point).
[abcde] Match any character in abcde
[a-h] Match any character between a and h
[^fgh] does not match any character in fgh
\w Match any one of the upper and lower case English characters and numbers 0 to 9 and underscore, which is equivalent to [a-zA-Z0-9_]
\W does not match any of the upper and lower case English characters and numbers 0 to 9, which is equivalent to [^a-zA-Z0-9_]
\s matches any whitespace character, equivalent to [ \f\n\r\t\v]
\S matches any non-whitespace character, equivalent to [^\s]
\d Match any single number between 0 and 9, equivalent to [0-9]
\D does not match any single number between 0 and 9, equivalent to [^0-9]
[\u4e00-\u9fa5] Match any single Chinese character (the Unicode encoding is used here to represent Chinese characters)
Regular expression qualifiers
The above metacharacters are all matched for a single character. If you want to match multiple characters at the same time, you also need to use qualifiers. Here are some common qualifiers (n and m are both integers and 0<n<m) in the following table:
* Match 0 to multiple metacharacters, equivalent to {0,}
? Match 0 to 1 metacharacter, equivalent to {0,1}
{n} match n metacharacters
{n,} matches at least n metacharacters
{n,m} match n to m metacharacters
+ Match at least 1 metacharacter, equivalent to {1,}
\b Match word boundaries
^ The string must start with the specified character
The $ string must end with the specified character
illustrate:
(1) Since characters such as " \ ", " ? ", " * ", " ^ ", " $ ", " + ", "(", "), "| ", " { ", " [ , etc. already have some special meanings in regular expressions. If they need to be used with their original meaning, they should be escaped. For example, if you want at least one " \ " in a string, then the regular expression should be written like this: \\+ .
(2) Multiple meta characters or literal text characters can be enclosed in brackets to form a group, such as ^(13)[4-9]\d{8}$ represents any mobile phone number starting with 13.
(3) In addition, the matching of Chinese characters is matched using their corresponding Unicode encoding. For a single Unicode character, such as \u4e00 represents the Chinese character "one", and \u9fa5 represents the Chinese character "dragon". In the Unicode encoding, these are the first and last Unicode encodings of the Chinese characters that can be represented, and 20,901 Chinese characters can be represented in the Unicode encoding.
(4) Regarding the usage of \b, it represents the beginning or end of a word, taking the string "123a 345b 456 789d" as an example string, if the regular expression is "\b\d{3}\b", it can only match 456.
(5) A relationship that can be represented by "|" can be represented by OR, for example [z|j|q] means matching any letter among z, j, and q.
expression | match |
---|---|
/^\s*$/ | Match empty lines. |
/\d{2}-\d{5}/ | Verify the ID number consisting of two digits, one hyphen plus five digits. |
/<\s*(\S+)(\s[^>]*)?>[\s\S]*<\s*\/\1\s*>/ | Match HTML tags. |
The following table contains a complete list of metacharacters and their behavior in the context of regular expressions:
character | illustrate |
---|---|
\ | Mark the next character as a special character, text, backreference, or octal escape character. For example, "n" matches the character "n". "\n" matches the newline character. The sequence "\\" matches "\", "\(" matches "(". |
^ | Matches the location where the input string begins. If setRegExpThe object'sMultilineProperties, ^ will also match the position after "\n" or "\r". |
$ | Match the end of the input string. If setRegExpThe object'sMultilineProperties, $ will also match the position before "\n" or "\r". |
* | Matches the preceding character or subexpression zero or multiple times. For example, zo* matches "z" and "zoo". * is equivalent to {0,}. |
+ | Matches the preceding character or subexpression one or more times. For example, "zo+" matches "zo" and "zoo" but not "z". + is equivalent to {1,}. |
? | Matches the preceding character or subexpression zero or once. For example, "do(es)?" matches "do" or "does". ? is equivalent to {0,1}. |
{n} | n Yes, non-negative integers. Just matchnSecond-rate. For example, "o{2}" does not match "o" in "Bob", but matches two "o" in "food". |
{n,} | n Yes, non-negative integers. At least matchn Second-rate. For example, "o{2,}" does not match "o" in "Bob" but all os in "fooooood". "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*". |
{n,m} | MandnHand non-negative integers, wheren <= m. Match at leastnTime, at mostmSecond-rate. For example, "o{1,3}" matches the first three os in "foooooood". 'o{0,1}' is equivalent to 'o?'. Note: You cannot insert spaces between commas and numbers. |
? | When this character is immediately followed by any other qualifier (*, +,?, {n}、{n,}、{n,m}) After that, the matching pattern is "non-greedy". The "non-greedy" pattern matches the searched, shortest string as possible, while the default "grey" pattern matches the searched, longest string. For example, in the string "oooo", "o+?" matches only a single "o", and "o+" matches all "o". |
. | Match any single character except "\n". To match any character including "\n", use a pattern such as "[\s\S]". |
(pattern) | Matchpatternand capture the matching subexpression. Can be used$0…$9Attributes retrieve captured matches from the result "match" collection. To match the bracket character ( ), use "\(" or "\)". |
(?:pattern) | MatchpatternBut the subexpression of the match is not captured, that is, it is a non-capture match and does not store the match for later use. This is useful for combining pattern parts with the "or" character (|). For example, 'industr(?:y|ies) is a more economical expression than 'industry|industries'. |
(?=pattern) | The subexpression that performs a forward prediction search, which matches the expression.patternThe starting point of the string. It is a non-capture match, i.e. it cannot capture matches for later use. For example, 'Windows (?=95|98|NT|2000)' matches 'Windows' in "Windows 2000" but not 'Windows' in "Windows 3.1". Prediction does not occupy characters, that is, after a match occurs, the next match's search follows the previous match, rather than after the character that makes up the prediction. |
(?!pattern) | Perform a subexpression that forwards the search in reverse prediction, and the expression matches are not in the match.patternThe search string for the starting point of the string. It is a non-capture match, i.e. it cannot capture matches for later use. For example, 'Windows (?!95|98|NT|2000)' matches 'Windows' in "Windows 3.1" but not 'Windows' in "Windows 2000'. Prediction does not occupy characters, that is, after a match occurs, the next match's search follows the previous match, rather than after the character that makes up the prediction. |
x|y | Matchxory. For example, 'z|food' matches "a" or "food". '(x|food' matches "zood" or "food". |
[xyz] | Character set. Match any character contained. For example, "[abc]" matches "a" in "plain". |
[^xyz] | Reverse character set. Match any characters not included. For example, "[^abc]" matches "p" in "plain". |
[a-z] | Character range. Match any character in the specified range. For example, "[a-z]" matches any lowercase letters in the range "a" to "z". |
[^a-z] | Reverse range characters. Match any characters that are not within the specified range. For example, "[^a-z]" matches any character that is not in the range of "a" to "z". |
\b | Match a word boundary, that is, the position between the word and the space. For example, "er\b" matches "er" in "never", but does not match "er" in "verb". |
\B | Non-word boundary matching. "er\B" matches "er" in "verb", but not "er" in "never". |
\cx | MatchxIndicated control characters. For example, \cM matches Control-M or carriage return.xThe value of must be between A-Z or a-z. If this is not the case, then c is assumed to be the "c" character itself. |
\d | Number character matching. Equivalent to [0-9]. |
\D | Non-numeric character matching. Equivalent to [^0-9]. |
\f | Page change matching. Equivalent to \x0c and \cL. |
\n | Line breaks match. Equivalent to \x0a and \cJ. |
\r | Match a carriage return character. Equivalent to \x0d and \cM. |
\s | Match any whitespace characters, including spaces, tabs, page breaks, etc. Equivalent to [ \f\n\r\t\v]. |
\S | Match any non-whitespace characters. Equivalent to [^ \f\n\r\t\v]. |
\t | Tab matching. Equivalent to \x09 and \cI. |
\v | Vertical tab matching. Equivalent to \x0b and \cK. |
\w | Match any character in word class, including underscores. Equivalent to "[A-Za-z0-9_]". |
\W | Matches any non-word character. Equivalent to "[^A-Za-z0-9_]". |
\xn | Matchn, herenis a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04" & "1". Allows the use of ASCII code in regular expressions. |
\num | Matchnum, herenumis a positive integer. to capture the backreference of the match. For example, "(.)\1" matches two consecutive identical characters. |
\n | Identifies an octal escape code or backreference. if \nAt least there isnA capture subexpression, thennIt is a backreference. Otherwise, ifnIt is an octal number (0-7), thennIt is an octal escape code. |
\nm | Identifies an octal escape code or backreference. if \nmAt least there isnmA capture subexpression, thennmIt is a backreference. if \nmAt least there isnone capture,nIt is a backreference, followed by a characterm. If neither of the previous cases exist, thennmMatch octal valuesnm,inn andmIt is an octal number (0-7). |
\nml | whennis an octal number (0-3),mandlWhen it is an octal number (0-7), match the octal escape code.nml。 |
\un | Matchn,innIt is a Unicode character represented by four-digit hexadecimal numbers. For example, \u00A9 matches the copyright symbol (©). |
username
/^[a-z0-9_-]{3,16}$/
password
/^[a-z0-9_-]{6,18}$/
Hexadecimal value
/^#?([a-f0-9]{6}|[a-f0-9]{3})$/
/^([wd_.-]+)@([wd_-]+.)+w{2,4}$/
/^([a-z0-9_.-]+)@([da-z.-]+).([a-z.]{2,6})$/
/^[a-zd]+(.[a-zd]+)*@([da-z](-[da-z])?)+(.{1,2}[a-z]+)+$/
URL
/^(https?://)?([da-z.-]+).([a-z.]{2,6})([/w .-]*)*/?$/
/^(https?://)?([wd_-]+.)+w{2,4}(/[wd.?-_%=&]+)*$/
IP address
/((2[0-4]d|25[0-5]|[01]?dd?).){3}(2[0-4]d|25[0-5]|[01]?dd?)/
or
/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/
HTML tags
/^<([a-z]+)([^<]+)*(?:>(.*)</1>|s+/>)$/
References:
1,/zh-cn/library/ae5bf541(VS.80).aspx
2,/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F