SoFunction
Updated on 2025-03-01

Regular expressions

When we want to parse an HTML document, we can use regular expressions to get the tag content
example:
Take the id number and content of all A tags from the string as an example:
<a  target="_blank">aaaaaaaaaa</a>
Regular expression:
<a[^<]*id[^<]*=[^<]*"(?<ID>[^<]*)"[^<]*target[^<]*=[^<]*"[^<]*_blank[^<]*" [^<]*>(?<content>[^<]*)</a>
Regular decomposition:
[^<]* is a very useful combination that can locate the next query keyword
(?<ID>[^<]*)  Used to obtain one or more values ​​until the next keyword is encountered
<ID> Similar to a regular variable, the content obtained with the () number is marked to facilitate the call of the program
Example of C# call:
string strRegex=@"<a[^<]*id[^<]*=[^<]*"(?<ID>[^<]*)"[^<]*target[^<]*=[^<]*"[^<]*_blank[^<]*" [^<]*>(?<CONTENT>[^<]*)</a>";
string strSource="<a id=\"1\" target=\"_blank\">aaaaaaaaaa</a>"
 r;
                 m;
                mc= new (strRegex, );
               ro = (strSource);
if ( >= 0)
 {
   for (int i = 0; i < ; i++)
    {
//Retrieve the ID and content
         string id = ro[i].Groups["ID"].Value;
         string topic = ro[i].Groups["CONTENT"].Value;
    }
 }