SoFunction
Updated on 2025-03-03

Detailed explanation of operators and instructions for regular expression tutorial

This article describes the operators and descriptions in regular expressions. Share it for your reference, as follows:

1. Ordinary characters

Includes all printed and non-printed characters that are not explicitly specified as metacharacters, such as upper and lowercase letters, numbers, punctuation marks, etc.

2. Metacharacter

Metacharacters are some characters with special meanings in regular expressions. Because metacharacters have special meanings in regular expressions, these characters cannot be used to represent them themselves. They must be escaped by adding a backslash prefix to them. The escape sequence obtained in this way will match that character itself rather than its special metacharacter meaning. For example, [ represents the beginning of a character set, if you want to match [ in a regular expression, you need to refer to: \[ to represent [ itself.

^ Matches the start position of the input string unless used in a square bracket expression, at which point it means that the character collection is not accepted. To match the ^ character itself, use \^.
$ Matches the end position of the input string. If you set multiple row matching, $ also matches '\n' or '\r'. To match the $ character itself, use \$.
() Marks the start and end positions of a subexpression. Subexpressions can be obtained for later use. To match these characters, use [Math Processing Error].
? Matches the previous subexpression zero or once, or indicates a non-greedy qualifier. To match the ? character, use \?.
* Matches the previous subexpression zero or multiple times. To match the * character, use \*.
+ Matches the previous subexpression once or more times. To match the + character, use \+.
. Match any single character except line break \n. To match ., use \.
[] Mark the beginning and end of a character collection. To match [or], use [Math Processing Error].
\ Mark the next character as a special character, or a literal character, or a backward reference, or an octal escape character. For example, 'n' matches the character 'n'. '\n' matches the newline character. The sequence '\\' matches "\", and '\(' matches "(".
| Specify a choice between two items. To match |, use \|.
{} Tag the beginning and end of a qualifier expression. To match {or}, use \{or\}.

3. Non-printed characters

\cx Matches the control characters specified by x. For example, \cM matches a Control-M or carriage return. The value of x must be one of A-Z or a-z. Otherwise, treat c as an original 'c' character.
\f Match a page break. Equivalent to \x0c and \cL.
\n Match a newline character. Equivalent to \x0a and \cJ.
\r Match a carriage return character. Equivalent to \x0d and \cM.
\t Match a tab character. Equivalent to \x09 and \cI.
\v Match a vertical tab. Equivalent to \x0b and \cK.

4. Predefined characters

. Any character (may or may not match the ending character \r\n)
\d Number, equivalent to [0-9]
\D Non-number, equivalent to [^0-9]
\s Whitespace characters, equivalent to [ \t\n\x0B\f\r]
\S Non-whitespace characters, equivalent to [^\s]
\w Word characters, equivalent to [a-zA-Z_0-9]
\W Non-word characters, equivalent to [^\w]
\b The front or back boundary of a word
\B Represent non-word boundaries

5. POSIX characters

[:alnum:] Any letter or number is equivalent to [a-zA-Z0-9]
[:alpha:] Any letter is equivalent to [a-zA-Z]
[:blank:] Space or tab character, equivalent to [\t]
[:cntrl:] ASCII control characters (ASCII 0 to 31, plus ASCII127)
[:digit:] Any number is equivalent to [0-9]
[:graph:] Any printable characters, but not spaces
[:lower:] Any lowercase letter is equivalent to [a-z]
[:print:] Any printable character
[:punct:] Neither a character belonging to [:alnum:] or [:cntrl:]
[:space:] Any whitespace character, including spaces, is equivalent to [^\f\n\r\t\v]
[:upper:] Any capital letter is equivalent to [A-Z]
[:xdigit:] Any hexadecimal number is equivalent to [a-fA-F0-9]

6. Qualifier

* Matches the previous subexpression zero or multiple times. For example, zo* can match "z" and "zoo". * is equivalent to {0,}.
+ Matches the previous subexpression once or more times. For example, 'zo+' can match "zo" and "zoo", but not "z". + is equivalent to {1,}.
? Matches the previous subexpression zero or once. For example, "do(es)?" can match "do" in "do" or "does". ? is equivalent to {0,1}.
{n} n is a non-negative integer. Match the n times that are determined. For example, 'o{2}' cannot match 'o' in "Bob", but can match two os in "food".
{n,} n is a non-negative integer. Match at least n times. For example, 'o{2,}' cannot match 'o' in "Bob" but can match all os in "fooooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'.
{n,m} m and n are non-negative integers, where n <= m. Match at least n times and match up to m times. For example, "o{1,3}" will match the first three os in "fooooooood". 'o{0,1}' is equivalent to 'o?'. Please note that there cannot be spaces between commas and two numbers.

Here we just classify some operators used in regular expressions according to different functions. They are not necessarily accurate, but just to explain the operators in regular expressions.

PS: Here are two very convenient regular expression tools for your reference:

JavaScript regular expression online testing tool:
http://tools./regex/javascript

Regular expression online generation tool:
http://tools./regex/create_reg

I hope this article will be helpful to everyone's learning regular expressions.