When we want to parse an HTML document, we can use regular expressions to get the tag content
example:
Take the id number and content of all A tags from the string as an example:
<a target="_blank">aaaaaaaaaa</a>
Regular expression:
<a[^<]*id[^<]*=[^<]*"(?<ID>[^<]*)"[^<]*target[^<]*=[^<]*"[^<]*_blank[^<]*" [^<]*>(?<content>[^<]*)</a>
Regular decomposition:
[^<]* is a very useful combination that can locate the next query keyword
(?<ID>[^<]*) Used to obtain one or more values until the next keyword is encountered
<ID> Similar to a regular variable, the content obtained with the () number is marked to facilitate the call of the program
Example of C# call:
string strRegex=@"<a[^<]*id[^<]*=[^<]*"(?<ID>[^<]*)"[^<]*target[^<]*=[^<]*"[^<]*_blank[^<]*" [^<]*>(?<CONTENT>[^<]*)</a>";
string strSource="<a id=\"1\" target=\"_blank\">aaaaaaaaaa</a>"
r;
m;
mc= new (strRegex, );
ro = (strSource);
if ( >= 0)
{
for (int i = 0; i < ; i++)
{
//Retrieve the ID and content
string id = ro[i].Groups["ID"].Value;
string topic = ro[i].Groups["CONTENT"].Value;
}
}
example:
Take the id number and content of all A tags from the string as an example:
<a target="_blank">aaaaaaaaaa</a>
Regular expression:
<a[^<]*id[^<]*=[^<]*"(?<ID>[^<]*)"[^<]*target[^<]*=[^<]*"[^<]*_blank[^<]*" [^<]*>(?<content>[^<]*)</a>
Regular decomposition:
[^<]* is a very useful combination that can locate the next query keyword
(?<ID>[^<]*) Used to obtain one or more values until the next keyword is encountered
<ID> Similar to a regular variable, the content obtained with the () number is marked to facilitate the call of the program
Example of C# call:
string strRegex=@"<a[^<]*id[^<]*=[^<]*"(?<ID>[^<]*)"[^<]*target[^<]*=[^<]*"[^<]*_blank[^<]*" [^<]*>(?<CONTENT>[^<]*)</a>";
string strSource="<a id=\"1\" target=\"_blank\">aaaaaaaaaa</a>"
r;
m;
mc= new (strRegex, );
ro = (strSource);
if ( >= 0)
{
for (int i = 0; i < ; i++)
{
//Retrieve the ID and content
string id = ro[i].Groups["ID"].Value;
string topic = ro[i].Groups["CONTENT"].Value;
}
}