SoFunction
Updated on 2025-03-10

Shell regular expressions are organized in detail

Classification of regular expressions

1. Basic Regular Expression (Basic RegEx, also known as BREs)
2. Extended Regular Expression (Extended RegEx, also known as EREs)
3. Perl Regular Expression (Perl Regular Expression, also known as Perl RegEx, referred to as PREs)

For details, please refer to this article:https:///tools/shell_regex.html

Commonly used regular expressions in shell

"^\d+$" //Non-negative integer (positive integer + 0)
"^[0-9]*[1-9][0-9]*$" //Positive integer
"^((-\d+)|(0+))$" //Not positive integer (negative integer + 0)
"^-[0-9]*[1-9][0-9]*$" //Negative integer
"^-?\d+$" //Integer
"^\d+(\.\d+)?$" //Non-negative floating point number (positive floating point number + 0)
"^(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]+)|([0-9]*[1-9][0-9]*))$" //Positive floating point number
"^((-\d+(\.\d+)?)|(0+(\.0+)?))$" //Non-positive floating point number (negative floating point number + 0)
"^(-((([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]+)|([0-9]*[1-9][0-9]*)))$" //Negative floating point number
"^(-?\d+)(\.\d+)?$" //Floating point number
"^[A-Za-z]+$" // A string composed of 26 English letters
"^[A-Z]+$" //A string composed of 26 English letters capitalization
"^[a-z]+$" //A string composed of lowercase of 26 English letters
"^[A-Za-z0-9]+$" // A string composed of numbers and 26 English letters
"^\w+$" //A string composed of numbers, 26 English letters or underscores
"^[\w-]+(\.[\w-]+)*@[\w-]+(\.[\w-]+)+$" //email address
“^[a-zA-z]+://(\w+(-\w+)*)(\.(\w+(-\w+)*))*(\?\S*)?$” //url 
/^(d{2}|d{4})-((0([1-9]{1}))|(1[1|2]))-(([0-2]([1-9]{1}))|(3[0|1]))$/ //   Year-Month-Day
/^((0([1-9]{1}))|(1[1|2]))/(([0-2]([1-9]{1}))|(3[0|1]))/(d{2}|d{4})$/ // Month/Day/Year
“^([w-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([w-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$”   //Emil 
/^((\+?[0-9]{2,4}\-[0-9]{3,4}\-)|([0-9]{3,4}\-))?([0-9]{7,8})(\-[0-9]+)?$/     //Phone Number
"^(d{1,2}|1dd|2[0-4]d|25[0-5]).(d{1,2}|1dd|2[0-4]d|25[0-5]).(d{1,2}|1dd|2[0-4]d|25[0-5]).(d{1,2}|1dd|2[0-4]d|25[0-5])$" //IP address

Regular expression matching Chinese characters: [\u4e00-\u9fa5]
Match double-byte characters (including Chinese characters): [^\x00-\xff]
Regular expression matching blank lines: \n[\s| ]*\r
Regular expression matching HTML tags: /<(.*)>.*<\/\1>|<(.*) \//
Regular expression matching the beginning and end spaces: (^\s*)|(\s*$)
Regular expression matching the email address: \w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
Regular expression matching URL URL: ^[a-zA-z]+://(\\w+(-\\w+)*)(\\.(\\w+(-\\w+)*))*(\\?\\S*)?$
Is the matching account legal (beginning with letters, 5-16 bytes allowed, alphanumeric underscores allowed): ^[a-zA-Z][a-zA-Z0-9_]{4,15}$
Match domestic phone number: (\d{3}-|\d{4}-)?(\d{8}|\d{7})?
Match Tencent QQ number: ^[1-9]*[1-9][0-9]*$

Metacharacters and their behavior in the context of regular expressions:

\ Mark the next character as a special character, or an primitive character, or a backward reference, or an octal escape character.
 
 
^ Matches the start position of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after '\n' or '\r'.
 
 
$ Matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before '\n' or '\r'.
 
 
* Matches the previous subexpression zero or multiple times.
 
 
+ Match the previous subexpression once or more times. + is equivalent to {1,}.
 
 
? Match the previous subexpression zero or once. ? is equivalent to {0,1}.
 
 
{n} n is a non-negative integer that matches the determined n times.
 
 
{n,} n is a non-negative integer that matches at least n times.
 
 
{n,m} m and n are non-negative integers, where n <= m. Match at least n times and match up to m times. There cannot be spaces between commas and two numbers.
 
 
? When the character is immediately followed by any other restriction character (*, +, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. The non-greedy pattern matches as few strings as possible, while the default greedy pattern matches as many strings as possible.
 
 
. Match any single character except "\n". To match any characters including '\n', use a pattern like '[.\n]'.
(pattern) Match pattern and get this match.
 
 
(?:pattern) Match pattern but does not get the matching result, that is, this is a non-get match and is not stored for future use.
 
 
(?=pattern) Forward pre-check, matching the lookup string at the beginning of any string matching pattern. This is a non-get match, that is, the match does not need to be retrieved for later use.
 
 
(?!pattern) Negative pre-examination, opposite to (?=pattern)
 
 
x|y matches x or y.
 
 
[xyz] Character collection.
 
 
[^xyz] Collection of negative values ​​characters.
 
 
[a-z] Character range, matching any character within the specified range.
 
 
[^a-z] Negative value character range, matching any character that is not within the specified range.
 
 
\b Match a word boundary, which means the position between the word and space.
 
 
\B Match non-word boundaries.
 
 
\cx Matches the control characters specified by x.
 
 
\d Match a numeric character. Equivalent to [0-9].
 
 
\D Match a non-numeric character. Equivalent to [^0-9].
 
 
\f Match a page break. Equivalent to \x0c and \cL.
 
 
\n Match a newline character. Equivalent to \x0a and \cJ.
 
 
\r Match a carriage return character. Equivalent to \x0d and \cM.
 
 
\s Match any whitespace characters, including spaces, tabs, page breaks, etc. Equivalent to [ \f\n\r\t\v].
 
 
\S Match any non-whitespace characters. Equivalent to [^ \f\n\r\t\v].
 
 
\t Match a tab character. Equivalent to \x09 and \cI.
 
 
\v Match a vertical tab character. Equivalent to \x0b and \cK.
 
 
\w Match any word character that includes an underscore. Equivalent to '[A-Za-z0-9_]'.
 
 
\W Match any non-word character. Equivalent to '[^A-Za-z0-9_]'.
 
 
\xn matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be the length of two numbers that are determined.
 
 
\num matches num, where num is a positive integer. Reference to the obtained match.
 
 
\n Identifies an octal escape value or a backward reference. If \n has at least n obtained subexpressions before, n is a backward reference. Otherwise, if n is an octal number (0-7), n is an octal escape value.
 
 
\nm Identifies an octal escape value or a backward reference. If \nm has at least obtained subexpressions before is preceded by at least nm, nm is a backward reference. If there are at least n retrieves before \nm, n is a backward reference followed by the literal m. If none of the previous conditions are satisfied, if both n and m are octal numbers (0-7), then \nm will match the octal escape value nm.
 
 
\nml If n is an octal number (0-3), and both m and l are octal numbers (0-7), the octal escape value nml is matched.
 
 
\un matches n, where n is a Unicode character represented by four hexadecimal numbers.
 
 

Regular expression matching Chinese characters: [u4e00-u9fa5]
 
 
Match double-byte characters (including Chinese characters): [^x00-xff]
 
 
Regular expression matching blank lines: n[s| ]*r
 
 
Regular expression matching HTML tags: /<(.*)>.*</1>|<(.*) />/
 
 
Regular expression matching the beginning and end spaces: (^s*)|(s*$)
 
 
Regular expression matching the email address: w+([-+.]w+)*@w+([-.]w+)*.w+([-.]w+)*
 
 
Regular expression matching URL:http://([w-]+.)+[w-]+(/[w- ./?%&=]*)? 
 
 

Use regular expressions to restrict the input content of the text box in the web form:

Use regular expressions to restrict only Chinese: onkeyup=”value=(/[^u4E00-u9FA5]/g,”)” onbeforepaste=”('text',('text').replace(/[^u4E00-u9FA5]/g,”))”
 
 
Use regular expressions to restrict only full-width characters: onkeyup=”value=(/[^uFF00-uFFFF]/g,”)” onbeforepaste=”('text',('text').replace(/[^uFF00-uFFF]/g,”))”
 
 
Use regular expressions to limit only numeric input: onkeyup=”value=(/[^d]/g,”) “onbeforepaste=”('text',('text').replace(/[^d]/g,”))”
 
 
Use regular expressions to limit only numeric and English: onkeyup=”value=(/[W]/g,”) “onbeforepaste=”('text',('text').replace(/[^d]/g,”))”
  

Commonly used regular expressions

 
 
Regular expression matching Chinese characters: [\u4e00-\u9fa5]
 
 
Match double-byte characters (including Chinese characters): [^\x00-\xff]
 
 
Regular expression matching blank lines: \n[\s| ]*\r
 
 
Regular expression matching HTML tags: /<(.*)>.*<\/\1>|<(.*) \//
 
 
Regular expression matching the beginning and end spaces: (^\s*)|(\s*$)
 
 
Regular expression matching IP address: /(\d+)\.(\d+)\.(\d+)\.(\d+)/g //
 
 
Regular expression matching email address: \w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
 
 
Regular expression matching URL:http://(/[\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? 
 
 

sql statement: ^(select|drop|delete|create|update|insert).*$

 
 
1. Non-negative integer: ^\d+$
 
 
2. Positive integer: ^[0-9]*[1-9][0-9]*$
 
 
3. Non-positive integer: ^((-\d+)|(0+))$
 
 
4. Negative integer: ^-[0-9]*[1-9][0-9]*$
 
 
5. Integer: ^-?\d+$
 
 
6. Non-negative floating point number: ^\d+(\.\d+)?$
 
 
7. Positive floating point number: ^((0-9)+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)| ([0-9]*[1-9][0-9]*))$
 
 
8. Non-positive floating point number: ^((-\d+\.\d+)?)|(0+(\.0+)?))$
 
 
9. Negative floating point number: ^(-(((regular formula for positive floating point number)))$
 
 
10. English string: ^[A-Za-z]+$
 
 
11. English capital string: ^[A-Z]+$
 
 
12. English lowercase string: ^[a-z]+$
 
 
13. English character string: ^[A-Za-z0-9]+$
 
 
14. Underlined English numbers: ^\w+$
 
 
15. E-mail address: ^[\w-]+(\.[\w-]+)*@[\w-]+(\.[\w-]+)+$
 
 
16、URL:^[a-zA-Z]+://(\w+(-\w+)*)(\.(\w+(-\w+)*))*(\?\s*)?$ 
Or: ^http:\/\/[A-Za-z0-9]+\.[A-Za-z0-9]+[\/=\?%\-&_~`@[\]\':+!]*([^& lt;>\"\"])*$
 
 
17. Postal code: ^[1-9]\d{5}$
 
 
18. Chinese: ^[\u0391-\uFFE5]+$
 
 
19. Phone number: ^((\(\d{2,3}\))|(\d{3}\-))?(\(0\d{2,3}\)|0\d{2,3}-)?[1-9] \d{6,7}(\-\d{1,4})?$
 
 
20. Mobile phone number: ^((\(\d{2,3}\))|(\d{3}\-))?13\d{9}$
 
 
21. Double-byte characters (including Chinese characters): ^\x00-\xff
 
22. Match the beginning and end spaces: (^\s*)|(\s*$) (trim function like vbscript)
 
23. Match HTML tags: <(.*)>.*<\/\1>|<(.*) \///>
 
24. Match empty lines: \n[\s| ]*\r
 
25. The network link in the extract information: (h|H)(r|R)(e|E)(f|F) *= *('|")?(\w|\\\|/|\.)+('|"| *|>)?
 
26. The email address in the extracted information: \w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
 
27. The image link in the extracted information: (s|S)(r|R)(c|C) *= *('|")?(\w|\\\|/|\.)+('|"| *|>)?
 
28. The IP address in the extracted information: (\d+)\.(\d+)\.(\d+)\.(\d+)\.(\d+)
 
29. Chinese mobile phone number extracted from the information: (86)*0*13\d{9}
 
30. Chinese fixed phone number extracted from the information: (\(\d{3,4}\)|\d{3,4}-|\s)?\d{8}
 
31. Extract Chinese phone numbers (including mobile and landline phones) in the information: (\(\d{3,4}\)|\d{3,4}-|\s)?\d{7,14}
 
32. Chinese postal code in the extract information: [1-9]{1}(\d+){5}
 
33. Extract floating point numbers (i.e. decimals) in the information: (-?\d*)\.?\d+
 
34. Extract any number in the information: (-?\d*)(\.\d+)?
 
35、IP:(\d+)\.(\d+)\.(\d+)\.(\d+)
 
36. Phone area code: /^0\d{2,3}$/
 
37. Tencent QQ number: ^[1-9]*[1-9][0-9]*$
 
38. Account number (beginning with letters, 5-16 bytes allowed, alphanumeric underscores allowed): ^[a-zA-Z][a-zA-Z0-9_]{4,15}$
 
39. Chinese, English, numbers and underscores: ^[\u4e00-\u9fa5_a-zA-Z0-9]+$

Thank you for reading this article, I hope it can help you. Thank you for your support for this site!