SoFunction
Updated on 2025-02-28

Regular expressions of JavaScript type system

definition

Regularity, also known as rules or patterns, is a powerful string matching tool. JavaScript supports regular expressions through RegExp type

characteristic

[1]Greedy, match the longest
[2] Lazy, if /g is not set, only the first one is matched

How to write

Perl writing method (using literal form)
var expression = /pattern/flags;

The pattern part can be any simple or complex regular expression that can contain strings, qualifying classes, groupings, forward searches, and backreferences. Each regular expression can have one or more flags to indicate the behavior of the regular expression. Regular expressions support three flags:

[1]g: represents global mode (global)
[2]i: means case insensitive (ignoreCase)
[3]m: represents multiline mode (multiline)

//Match all 'at' instances of stringvar pattern1 = /at/g; 

RegExp constructor

The RegExp constructor receives two parameters: the string pattern to match and the flag string (optional)

[Note] Both parameters of the RegExp constructor are strings. And any expression defined using literal form can use constructors

//Match all 'at' instances of stringvar pattern = new RegExp('at','g'); 

The difference between the two writing methods

Literal writing does not support variables, it can only be written in the form of a constructor.
[tips]Get class element (because classname is a variable, you can only use the form of a constructor)

function getByClass(obj,classname){
  var elements = ('*');
  var result = [];
  var pattern = new RegExp( '(^|\\s)'+ classname + '(\\s|$)');
  for(var i = 0; i < ; i++){
    if((elements[i].className)){
      (elements[i]);
    }
  }
  return result;
} 

[Note] In ES3, regular expression literals always share the same RegExp instance, and each new RegExp instance created using the constructor is a new instance. ES5 stipulates that using regular literals must be created every time, just like calling the RegExp constructor directly.

grammar

[Note] No extra spaces appear in regular expressions

Metacharacter (14)

    () [] {} \ ^ $ | ? * + . 
[Note] Metachars must be escaped, that is, add escape characters with \, and regulars written in new RegExp must be double escaped

Escape characters

.      Arbitrary characters except line breaks\n
\d     Number
\D     Non-number
\w     Letters, numbers, underlines
\W      Non-letters, numbers, and underscores (Chinese characters do not belong to \w)
\s     Space
\S     Non-space
\b    Border symbol (If the left or right side of \w is not \w, a boundary symbol will appear)
\B     Non-boundary symbol
\1     Represents the same character as before
\t     Tab
\v     Vertical tab
\uxxxx Find Unicode characters specified in hexadecimal xxxx (\u4e00-\u9fa5 represents Chinese)
(\w)(\d)\1\2 :\1 represents the value represented by \w at that time, and \2 represents the value represented by \d at that time

[Note] Children in regular expressions must be enclosed in brackets, and the order shall be based on the order in which the precedents of brackets appear.

[tips] Find out the characters and numbers with the most duplicates

var str = 'aaaaabbbbbdddddaaaaaaaffffffffffffffffffgggggcccccce';
var pattern = /(\w)\1+/g;
var maxLength = 0;
var maxValue = '';
var result = (pattern,function(match,match1,pos,originalText){
  if( > maxLength){
    maxLength = ;
    maxValue = match1;
  }
})
(maxLength,maxValue);//18 "f" 

System escape

The characters in alert() and () are system escaped characters

\0 empty bytes
\n Line break
\tTable making
\b Space
\r Enter
\f Feed
\\ Slash
\' Single quotes
\" Double quotes
\xnn means a character in hexadecimal nn (n is 0-f), such as \x41 means 'A'
\unnnn Hexadecimal nnnn represents a Unicode character (n is 0-f), such as \u03a3 represents Greek character ε

[Note] The line breaks in alert cannot be used <br> or <br\>, but should be used \n

 alert('\n\tHello') 

Double escape

Since the parameters of the RegExp constructor are strings, in some cases, double escapes are required for characters. All meta characters must be double escaped, and characters that have been escaped must also be double escaped.

Literal pattern      -&gt;     Equivalent string
/\[bc\]at/           "\\[bc\\]at"
/\.at/             "\\.at"
/name\/age/           "name\\/age"
/\d.\d{1,2}/          "\\d.\\d{1,2}"
/\w\\hello\\123/        "\\w\\\\hello\\\\123" 

quantifier

{n}       Match n times
{n,m}    Match at least n times, up to m times
{n,}     Match at least n times
?                                                              �
*
+

Position symbol

^                                                              �
$        End symbol
?=
?!        Negative forward viewing

Control symbols

[]     Candidate
|      Or
^      No
- arrive

(red|blue|green)         Find any specified option
[abc]                                                                                                                                                                                                                                                             �
[^abc]
[0-9]
[a-z]
[A-Z]
[A-z]
[adgk]
[^adgk]                                                                                                                            �

$ symbol

$$         $
$&                                                              �
$`                                                                                                                                                                                                                                                              �
$'                                                                                                                              �
$n                                                              � $1 means the substring matching the first capture group (calculated from the first one)
$nn

('cat,bat,sat,fat'.replace(/(.a)(t)/g,'$0'))//$0,$0,$0,$0      
('cat,bat,sat,fat'.replace(/(.a)(t)/g,'$1'))//ca,ba,sa,fa
('cat,bat,sat,fat'.replace(/(.a)(t)/g,'$2'))//t,t,t,t
('cat,bat,sat,fat'.replace(/(.a)(t)/g,'$3'))//$3,$3,$3,$3      
('cat,bat,sat,fat'.replace(/(.a)(t)/g,'$$'))//$,$,$,$
('cat,bat,sat,fat'.replace(/(.a)(t)/g,'$&'))//cat,bat,sat,fat
('cat,bat,sat,fat'.replace(/(.a)(t)/g,'$`'))//,cat,,cat,bat,,cat,bat,sat,
('cat,bat,sat,fat'.replace(/(.a)(t)/g,"$'"))//,bat,sat,fat,,sat,fat,,fat, 

Constructor properties

Applicable to all regular expressions in scope and vary based on the most recent regular expression operation performed. What's unique about these properties is that they can be accessed in two ways, namely long attribute names and short attribute names. Most short attribute names are not valid ECMAScript identifiers, so they must be accessed through square bracket syntax.

Long attribute name Short attribute name Description
input
lastMatch
lastParen
leftContext
multiline                                                              �
rightContext

Using these properties, more specific information can be extracted from operations performed by the exec() method or text() method

var text = 'this has been a short summer';
var pattern = /(.)hort/g;
if((text)){
  ();//'this has been a short summer'
  ();//'this has been a '
  ();//' summer'
  ();//'short'
  ();//'s'
  ();//false
  (RegExp['$_']);//'this has been a short summer'
  (RegExp['$`']);//'this has been a '
  (RegExp["$'"]);//' summer'
  (RegExp['$&']);//'short'
  (RegExp['$+']);//'s'
  (RegExp['$*']);//false    
} 

//javascript has 9 constructor properties for storing capture groups. RegExp.$1\RegExp.$2\RegExp.$3... to RegExp.$9 are used to store the first, second, and ninth matching capture groups respectively. These properties are automatically filled when calling the exec() or test() method

var text = 'this has been a short summer';
var pattern = /(..)or(.)/g;
if((text)){
  (RegExp.$1);//sh
  (RegExp.$2);//t
}

Instance properties

Through instance properties, you can get all aspects of a regular expression, but it is not very useful, because this information is included in the pattern declaration
global:  Boolean value, indicating whether the g flag is set
ignoreCase: Boolean value, indicating whether the i flag is set
lastIndex:  integer, indicating the character position of the start searching for the next match, starting from 0
multiline:  Boolean value, indicating whether the flag m is set
source: string representation of regular expressions, returned in literal form rather than string pattern passed in constructor

var pattern = new RegExp('\\[bc\\]at','i');
();//false
();//true  
();//false
();//0
();//'\[bc\]at' 

Inheritance method

There are three methods toString(), toLocaleString() and valueOf(), and they all return regular expression literals, which have nothing to do with the way of creating regular expressions. It should be noted that the string representation of the regular expression returned by toString() and toLocaleString(), while valueOf returns the regular expression object itself

var pattern = new RegExp('\\[bc\\]at','gi');
(()); // '/\[bc\]at/gi'
(()); // '/\[bc\]at/gi'
(()); // /\[bc\]at/gi 

Example method

exec()

Designed specifically for capturing groups, accepting a parameter, namely the string to which the pattern is to be applied. Then return an array containing the first match information. Returns null if there is no match. The returned array contains two additional properties: index and input. index means the match is at the position of the string, and input means the string where the regular expression is applied. In an array, the first item is a string that matches the entire pattern, and the other items are a string that matches the capture group in the pattern. If there is no capture group in the pattern, the array only contains one

var text = 'mom and dad and baby and others';
var pattern = /mom( and dad( and baby)?)?/gi;
var matches = (text);
(pattern,matches);
//:20
//matches[0]:'mom and dad and baby'
//matches[1]:' and dad and baby'
//matches[2]:' and baby'
//:0
//:'mom and dad and baby and others'  

[Note] For the exec() method, even if the global flag (g) is set in the pattern, it will return only one match at a time. Without setting the global flag, calling exec() multiple times on the same string will always return the information of the first match; while when setting the global flag, each call to exec() will continue to look for a new match in the string. IE8 - There is a bias on the lastIndex attribute, even in non-global mode, the lastIndex attribute changes every time

var text = 'cat,bat,sat,fat';
var pattern1 = /.at/;
var matches = (text);
(pattern1,matches);
//:0
//matches[0]:'cat'
//:0
//:'cat,bat,sat,fat'
var text = 'cat,bat,sat,fat';
matches = (text);  
(pattern1,matches);  
//:0
//matches[0]:'cat'
//:0
//:'cat,bat,sat,fat' 
var text = 'cat,bat,sat,fat';
var pattern2 = /.at/g;
var matches = (text);
(pattern2,matches);  
//:3
//matches[0]:'cat'
//:0
//:'cat,bat,sat,fat'
var text = 'cat,bat,sat,fat';
matches = (text);
(pattern2,matches);  
//:7
//matches[0]:'bat'
//:4
//:'cat,bat,sat,fat'  

[tips] Use the exec() method to find out all matching positions and all values

var string = 'j1h342jg24g234j 3g24j1';
var pattern = /\d/g;
var valueArray = [];//valuevar indexArray = [];//Locationvar temp = (string);
while(temp != null){
  (temp[0]);
  ();
  temp = (string);  
}
//["1", "3", "4", "2", "2", "4", "2", "3", "4", "3", "2", "4", "1"] [1, 3, 4, 5, 8, 9, 11, 12, 13, 16, 18, 19, 21]
(valueArray,indexArray);  

test()

Accepts a string parameter, return true if the pattern matches this parameter, otherwise return false
[Note] It is often used in cases where you just want to know whether the target string matches a pattern, but you don't need to know its text content. It is often used in if statements

var text = '000-00-000';
var pattern = /\d{3}-\d{2}-\d{4}/;
if((text)){
  ('The pattern was matched');
}

Pattern matching method

The String type defines several methods for matching patterns in strings

match()

Only accept one parameter, regular or string, save the matching content to an array and return it
[Note] When adding global tags, there is no index and input attributes in the return value of the match() method.

[1]Not added/g

var string = 'cat,bat,sat,fat';
var pattern = /.at/;
var matches = (pattern);
(matches,,);//['cat'] 0 'cat,bat,sat,fat' 

[2] Add/g

var string = 'cat,bat,sat,fat';
var pattern = /.at/g;
var matches = (pattern);
(matches,,);//['cat','bat','sat','fat'] undefined undefined 

[3] String

var string = 'cat,bat,sat,fat';
var pattern = 'at';
var matches = (pattern);
(matches,,);//['at'] 1 'cat,bat,sat,fat' 
search()

Only accept one parameter, regular or string, return the position where the matching content first appears in the string, similar to indexOf where the starting position cannot be set, return -1 cannot be found

[1] Regular (the effect of adding /g is the same as not adding /g)

var string = 'cat,bat,sat,fat';
var pattern = /.at/;
var pos = (pattern);
(pos);//0 

[2] String

var string = 'cat,bat,sat,fat';
var pattern = 'at';
var pos = (pattern);
(pos);//1 

[tips] Find all matching locations

function fnAllSearch(str,pattern){
  var pos = (pattern); 
  var length = (pattern)[0].length;
  var index = pos+length;
  var result = [];
  var last = index;
  (pos);
  while(true){
    str = (index);          
    pos = (pattern);
    if(pos === -1){
      break;
    }
    length = (pattern)[0].length;
    index = pos+length;
    (last+pos);
    last += index;  
  }
  return result;
}  
(fnAllSearch('cat23fbat246565sa3dftf44at',/\d+/));//[3,9,17,22] 

replace()

This method receives two parameters: the first is a regular expression or string (the content to be searched), and the second is a string or function (the content to be replaced)

[1] String replacement

var string = 'cat,bat,sat,fat';
var result = ('at','ond');
(result);//'cond,bat,sat,fat' 

[2] Regular No/g Replacement

var string = 'cat,bat,sat,fat';
var result = (/at/,'ond');
(result);//'cond,bat,sat,fat' 

[3] Regular replacement

var string = 'cat,bat,sat,fat';
var result = (/at/g,'ond');
(result);//'cond,bond,sond,fond' 

[4] Function replacement

In the case of only one match (that is, a string matching the pattern, three parameters will be passed to the function: the pattern match, the position of the pattern match in the string, and the original string. In the case where the regular expression defines multiple capture groups, the parameters passed to the function are the pattern match, the first capture group match, the second capture group match... the Nth capture group match, but the last two parameters are still the position of the pattern match in the string and the original string. This function returns a string

var string = 'cat,bat,sat,fat';
var index = 0;
var result = (/at/g,function(match,pos,originalText){
  index++;
  if( index== 2){
    return 'wow';
  }else{
    return '0';
  }
});
(result);//'c0,bwow,s0,f0' 

[tips] Prevent cross-site scripting attacks xss(css)

function htmlEscape(text){
  return (/[&lt;&gt;"&amp;]/g,function(match,pos,originalText){
    switch(match){
      case '&lt;':
      return '&amp;lt;';
      case '&gt;':
      return '&amp;gt;';
      case '&amp;':
      return '&amp;amp;';
      case '\"':
      return '&amp;quot;';
    }
  });
}
(htmlEscape('&lt;p class=\"greeting\"&gt;Hello world!&lt;/p&gt;'));
//&amp;lt;p class=&amp;quot; greeting&amp;quot;&amp;gt;Hello world!&amp;lt;/p&amp;gt;
(htmlEscape('&lt;p class="greeting"&gt;Hello world!&lt;/p&gt;'));
//Same as above 

split()

This method can split a string into multiple strings based on the specified delimiter and place the result in an array. The delimiter can be a string or a RegExp. This method can accept a second parameter (optional) to specify the size of the array. If the second parameter is a value in the range, output according to the specified parameter, and other cases will output all results.

[Note] IE8-Catching groups will be ignored for regular expressions in split()

[tips]If it is split(''), the original array will be divided into characters and passed out after the original character is divided.

var colorText = 'red,blue,green,yellow';
((''));//["r", "e", "d", ",", "b", "l", "u", "e", ",", "g", "r", "e", "e", "n", ",", "y", "e", "l", "l", "o", "w"]
((','));//["red", "blue", "green", "yellow"]
((',',2));//["red", "blue"]
((/\,/));//["red", "blue", "green", "yellow"]
((/e/));//["r", "d,blu", ",gr", "", "n,y", "llow"]
((/[^\,]+/));//Change strings other than commas to delimiters["", ",", ",", ",", ""],IE8- will be recognized as [",",",",","] 

Limitations

The following are features that are not supported by ECMAScript regular expressions

[1] Match the \A and \Z anchors at the end of the string (but it supports matching the beginning and end of the string with ^ and $)
[2] Backward search (but forward search supports)
[3] Unity and Intersection Class
[4] Atomic Group
[5]Unicode support (except for single characters)
[6] Named capture group (but numbered capture group)
[7]S(single single line) and x(free-spacing without interval) matching mode
[8] Conditional Match
[9] Regular Expression Comments

Common examples

[1] Two methods to find all numbers in a string
[a] Operation with traditional strings

var str1 = 'j1h342jg24g234j 3g24j1';
var array = [];
var temp = '';
for(var i = 0; i &lt; ; i++){
  var value = parseInt((i));//If you use Number(), spaces cannot be excluded  if(!isNaN(value)){
    temp += (i);
  }else{
    if(temp != ''){
      (temp);
      temp = '';  
    }
  }
}
if(temp != ''){
  (temp);
  temp = '';  
}
(array);//["1", "342", "24", "234", "3", "24", "1"] 

[b]Use regular expression

var str1 = 'j1h342jg24g234j 3g24j1';
array = (/\d+/g);
(array);//["1", "342", "24", "234", "3", "24", "1"] 

[2] Sensitive word filtering (function matching of replace method)

var string = 'FLG is a cult';
var pattern = /FLG|Cult/g;
var result = (pattern,function($0){
  var s = '';
  for(var i = 0; i &lt; $; i++){
    s+= '*';
  }
  return s;
})
(result);//***yes** 

[3] Date formatting

var array = ['2015.7.28','2015-7-28','2015/7/28','2015.7-28','2015-7.28','2015/7---28'];
function formatDate(date){
  return (/(\d+)\D+(\d+)\D+(\d+)/,'$1'+'Year'+'$2'+'moon'+'$3'+'day')
}
var result = [];
for(var i = 0 ; i &lt; ; i++){
  (formatDate(array[i]));
}
(result);//["July 28, 2015", "July 28, 2015", "July 28, 2015", "July 28, 2015", "July 28, 2015", "July 28, 2015"] 

[4] Get text content in the web page

var str = '<p>refds</p><p>fasdf</p>'
var pattern = /<[^<>]+>/g;
((pattern,''));//refdsfasdf 

[5] Compatible writing of trim() to remove the beginning and end spaces

var string = '  my name is littlematch  ';
((/^\s+|\s+$/,''));//my name is littlematch 

All the contents of regular expressions in JavaScript type system are introduced to you here. I hope this article can help you.