SoFunction
Updated on 2025-03-04

Regular basis of decimal points

Some details

For most languages ​​and tools that use traditional NFA engines, such as Java and .NET, the matching range of "." is to match any character except the newline "\n".
However, it is a bit special for JavaScript. Since the parsing engines of each browser are different, the matching range of "." is also different. For browsers in the Trident kernel, such as IE, "." also matches any character except the line break "\n", but for browsers in other kernels, such as Firefox, Opera, and Chrome, "." "." matches any character except the carriage return "\r" and the line break "\n".

Some guesses about this detail

<script type="text/javascript">
    (/./.test("\r") + "<br />");
    (/./.test("\n") + "<br />");
</script>
/*-----------------------------------------------------------------------------------------------------------------------------
 true
 false
 */
/*-----------------------------------------------------------------------------------------------------------------------------
 false
 false
 */

After roughly estimating, Trident, Presto and Gecko should all use traditional NFA engines, while webkit at least supports traditional NFA engines, but it performs differently from traditional NFA engines. It is estimated that either a traditional NFA engine with advanced optimization or a DFA/NFA hybrid engine.

Since "\r" and "\n" are supported in Windows, and only "\n" is supported in UNIX, I guess it may be because other browser engines do not come from Windows, so there is no support for "\r", which causes "." in the regular also not to match "\r". No in-depth research, just some speculation.

Common application misunderstandings

Notice

When matching multiple lines, do not try to match any character with "[.\n]". This way of writing only represents one of the two characters of the decimal point and the newline character. You can use "(.|\n)", but it is not used in this way. This way of writing is poorly readable and inefficient. Generally, "[\s\S]" is used, or "." plus (?s) matching mode to achieve this effect.

Give an example

Requirement description: Match the content in the <td> tag
Source string: <td>This is a test line.
Another line. </td>
Match result: <td>This is a test line.
Another line. </td>
Regular expression 1: <td>[\s\S]*</td>
Regular expression 2: (?s)<td>.*</td>

Matching efficiency test

The following is the test string, that is, the content entered below (taken from the CSDN homepage):

&lt;link href="images/" rel="SHORTCUT ICON" /&gt;
&lt;title&gt; - China's leadingITTechnology Community,forITProfessional and technical personnel provide the most comprehensive information dissemination and service platform&lt;/title&gt;
&lt;script language='JavaScript' type='text/javascript' src='/ggmm/csdn_ggmm.js'&gt;&lt;/script&gt; &lt;script type="text/javascript" src="/a/js/%22%3E%3C/script&gt;
&lt;script type="text/javascript"&gt;

Test code:

string yourStr = ;
            StringBuilder src = new StringBuilder(4096);
            for (int i = 0; i &lt; 10000; i++)
            {
                (yourStr);
            }
            string strData = ();
            List&lt;Regex&gt; reg = new List&lt;Regex&gt;();
            (new Regex(@"[\s\S]"));
            (new Regex(@"[\w\W]"));
            (new Regex(@"[\d\D]"));
            (new Regex(@"(.|\n)"));
            (new Regex(@"(?s)."));
            string test = ;
            Stopwatch stopW = new Stopwatch();
            foreach (Regex re in reg)
            {
                ();
                ();
                test = strData;
                test = (test, "");
                ();
                 += "regular expression:" + ().PadRight(10) + "Execution time:" + () + " ms";
                 += "\n---------------------------------------\n";
            }

Test results:
The test is carried out in two groups, and the memory occupies 921M before the program is executed.
One group is a quantifier that is not used, only one character is replaced at a time, the execution time is as follows, and it occupies 938M memory.

Regular expression: [\s\S] Execution time: 2651 ms
---------------------------------------
Regular expression: [\w\W] Execution time: 2515 ms
---------------------------------------
Regular expression: [\d\D] Execution time: 2187 ms
---------------------------------------
Regular expression: (.|\n) Execution time: 2470 ms
---------------------------------------
Regular expression: (?s). Execution time: 1969 ms

Another group uses quantifiers, replacing all characters at once, the execution time is as follows, and occupies 1128M memory.

Test results (with quantifiers)
Regular expression: [\s\S]+ Execution time: 249 ms
---------------------------------------
Regular expression: [\w\W]+ Execution time: 348 ms
---------------------------------------
Regular expression: [\d\D]+ Execution time: 198 ms
---------------------------------------
Regular expression: (.|\n)+ Execution time: 879 ms
---------------------------------------
Regular expression: (?s).+ Execution time: 113 ms
---------------------------------------

Test results analysis:

The most efficient matching is the "." ", which uses Singleline, a matching pattern, "."
The second is "[\d\D]", while the matching efficiency of "(.|\n)" is the lowest
The matching efficiency of "[\s\S]" is centered, but it is used to be more useful.

Note: Since the engines supported by each language are different, even if the same engine is used, the optimizations to regularity are different. Therefore, the above performance test conclusions may only apply to .NET.