$str = preg_replace("/(<a.*?>)(.*?)(<\/a>)/", '\1<span>\2</span>\3', $str);
There are three sub-modes (each parentheses is a sub-mode), the first is the link start tag, the second is the link text, and the third is</a>
Then in the second parameter \1, \2, and \3 represent these three parts. What is it not easy to replace?
PHP function to get all link addresses in the page
The following function written in PHP can obtain all link addresses in any string $string ($string can be a string read directly from an HTML page file), and the result is saved in an array to return. The function automatically excludes the email address, and there will be no duplicate elements in the returned array.
function GetAllLink($string) { $string = str_replace("\r","",$string); $string = str_replace("\n","",$string); $regex[url] = "((http|https|ftp|telnet|news):\/\/)?([a-z0-9_\-\/\.]+\.[][a-z0-9:;&#@=_~%\?\/\.\,\+\-]+)"; $regex[email] = "([a-z0-9_\-]+)@([a-z0-9_\-]+\.[a-z0-9\-\._\-]+)"; //Remove the text between the labels$string = eregi_replace(">[^<>]+<","><", $string); //Remove the JAVASCRIPT code$string = eregi_replace("<!--.*//-->","", $string); //Remove non-<a> HTML tags$string = eregi_replace("<[^a][^<>]*>","", $string); //Remove the EMAIL link$string = eregi_replace("<a([ ]+)href=([\"']*)mailto:($regex[email])([\"']*)[^>]*>","", $string); //Replace the required web page link$string = eregi_replace("<a([ ]+)href=([\"']*)($regex[url])([\"']*)[^>]*>","\\3\t", $string); $output[0] = strtok($string, "\t"); while(($temp = strtok("\t"))) { if($temp && !in_array($temp, $output)) $output[++$i] = $temp; } return $output; }
The following is an example written in PHP syntax
Verify whether the string contains only numbers and English, and the string length is between 4 and 16 characters
<?php $str = 'a1234'; if (preg_match("^[a-zA-Z0-9]{4,16}$", $str)) { echo "Verification Successfully";} else { echo "Verification failed";}?>
Simple * ID card font verification
<?php $str = 'a1234'; if (preg_match("^(?:\d{15}|\d{18})$", $str)) { echo "Verification Successfully"; } else { echo "Verification failed";} ?>
The following code implements the code block in the text, and the function is just like the code you see in me.
function codedisp($code) { global $discuzcodes; $discuzcodes['pcodecount']++; $code = htmlspecialchars(str_replace('\\"', '"', preg_replace("/^[\n\r]*(.+?)[\n\r]*$/is", "\\1", $code))); $discuzcodes['codehtml'][$discuzcodes['pcodecount']] = "<br><div class=\"msgheader\"><div class=\"right\"><a href=\"###\" class=\"smalltxt\" onclick=\"copycode($('phpcode$discuzcodes[codecount]'));\">[Copy this code]</a></div>The code is as follows:</div><div class=\"msgborder\" id=\"phpcode$discuzcodes[codecount]\">".fhtml2($code)."</div><br>";$discuzcodes['codecount']++; return "[\tDISCUZ_CODE_$discuzcodes[pcodecount]\t]"; } $message = preg_replace("/\s*\[code\](.+?)\[\/code\]\s*/ies", "codedisp('\\1')", $message); $message = preg_replace("/\s*\[html\](.+?)\[\/html\]\s*/ies", "htmldisp('\\1')", $message);
Regular expression matching Chinese characters: [\u4e00-\u9fa5]
Comment: Matching Chinese is really a headache, it's easy to do with this expression
Match double-byte characters (including Chinese characters): [^\x00-\xff]
Comment: It can be used to calculate the length of a string (a double-byte character length meter 2, ASCII character meter 1)
Regular expression matching blank lines:\n\s*\r
Comment: Can be used to delete blank lines
Regular expression matching HTML tags: <(\S*?)[^>]*>.*?</\1>|<.*? />
Comment: The version circulating online is too bad, and the above one can only match the part, and it is still powerless to use complex nested markers.
Regular expression matching the beginning and end whitespace characters: ^\s*|\s*$
Comment: It can be used to delete whitespace characters at the beginning and end of the line (including spaces, tabs, page breaks, etc.), a very useful expression
Regular expression matching email address: \w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
Comment: It is very practical when verifying the form
Regular expression matching URL: [a-zA-z]+://[^\s]*
Comment: The functions of the version circulating online are very limited, and the above can basically meet the needs
Match whether the account is legal (beginning with letters, 5-16 bytes allowed, alphanumeric underscores allowed): ^[a-zA-Z][a-zA-Z0-9_]{4,15}$
Comment: It is very practical when verifying the form
Match domestic phone number: \d{3}-\d{8}|\d{4}-\d{7}
Comment: Matching forms are as follows: 0511-4405222 or 021-87888822
Match Tencent QQ number: [1-9][0-9]{4,}
Comment: Tencent QQ number starts at 10,000
Match the Chinese postal code: [1-9]\d{5}(?!\d)
Comment: China's postal code is 6 digits
Match ID card: \d{15}|\d{18}
Comment: China's ID card is 15 or 18 digits
Match IP address: \d+\.\d+\.\d+\.\d+
Comment: It is useful when extracting IP addresses
Match specific numbers:
^[1-9]\d*$ //Match positive integers
^-[1-9]\d*$ //Match negative integers
^-?[1-9]\d*$ //Match integers
^[1-9]\d*|0$ //Match non-negative integers (positive integer + 0)
^-[1-9]\d*|0$ //Match non-positive integers (negative integer + 0)
^[1-9]\d*\.\d*|0\.\d*[1-9]\d*$ //Match positive floating point number
^-([1-9]\d*\.\d*|0\.\d*[1-9]\d*)$ //Match negative floating point numbers
^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$ //Match floating point numbers
^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$ //Match non-negative floating point numbers (positive floating point numbers + 0)
^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$//Match non-positive floating point numbers (negative floating point numbers + 0)
Comment: It is useful when processing large amounts of data, please pay attention to corrections when applying it in detail.
Match a specific string:
^[A-Za-z]+$//Match a string composed of 26 English letters
^[A-Z]+$//Match a string composed of 26 English letters capitalizations
^[a-z]+$//Match a string composed of 26 English letters lowercase
^[A-Za-z0-9]+$//Match a string composed of numbers and 26 English letters
^\w+$//Match a string composed of numbers, 26 English letters or underscores
Here are some special characters:
Special characters in regular expressions: (Study Reference Book - <<Proficient in Regular Expressions>>)
character
Meaning: For characters, they usually mean literally, pointing out that the next character is a special character and will not be explained.
For example: /b/ matches the character 'b', by adding a backslash before b, that is, /b/, the character becomes a special character, indicating that
Match the dividing line of a word.
or:
For several characters, the description is usually special, indicating that the following characters are not special, but should be interpreted literally.
For example: * is a special character that matches any character (including 0 characters); for example: /a*/ means matching 0 or more a.
To match the literal *, add a backslash before a; for example: /a*/match 'a*'.
Character^
Meaning: The character that indicates that the matching must be at the front.
For example: /^A/ does not match 'A' in "an A," but matches 'A' in "An A."
Character $
Meaning: Similar to ^, matching the last character.
For example: /t$/ does not match 't' in "eater", but match 't' in "eater".
character*
Meaning: Match * 0 or n times before the characters.
For example:/bo*/ matches 'boooo' in "A ghost booooed" or 'b' in "A bird warbled", but does not match "A goat g
any character in runted ".
Character +
Meaning: Match the characters before the + sign 1 or n times. Equivalent to {1,}.
For example: /a+/ matches all 'a' in "candy" and "caaaaaaandy."
character?
Meaning: Match? 0 or 1 previous character.
For example: /e?le?/matches 'el' in "angel" and 'le' in "angle."
character.
Meaning: (decimal point) matches all individual characters except line breaks.
For example: /.n/ matches 'an' and 'on' in "nay, an apple is on the tree", but does not match 'nay'.
Characters (x)
Meaning: Match 'x' and record the matching value.
For example: /(foo)/Match and record 'foo' in "foo bar." Matching substrings can be returned by the element [1], ..., [n] in the result array
Return, or returned by the properties $1, ..., $9 of the RegExp object.
Character x|y
Meaning: Match 'x' or 'y'.
For example: /green|red/ matches 'green' in "green apple" and 'red' in "red apple."
Character {n}
Meaning: n here is a positive integer. Match the previous n characters.
For example: /a{2}/ does not match 'a' in "candy," but matches all 'a' in "caandy," and the first two in "caaandy."
'a'。
Character {n,}
Meaning: n here is a positive integer. Match at least n preceding characters.
For example: /a{2,} does not match 'a' in "candy", but matches all 'a' in "caandy" and all 'a' in "caaaaaaandy."
Character {n,m}
Meaning: n and m here are both positive integers. Match at least n at most m previous characters.
For example: /a{1,3}/ does not match any character in "cndy", but matches the first two in "candy,"
'a' and the three 'a' in the first three in "caaaaaaandy", note: Even if there are many 'a' in the "caaaaaaandy", it only matches the three in the first three
'a' is "aaa".
Characters [xyz]
Meaning: a list of characters, matching any character listed. You can point out a character range by hyphen.
For example: [abcd] is the same as [a-c]. They match 'b' in "brisket" and 'c' in "ache".
Characters [^xyz]
Meaning: One-character complement, that is, it matches everything except the listed characters. You can use hyphen - point out a
Character range.
For example: [^abc] and [^a-c] are equivalent, they first match 'r' in "brisket" and 'h' in "chop."
character
Meaning: Match a space (not to be confused with b)
Character b
Meaning: Match the dividing line of a word, such as a space (not to be confused with)
For example: /bnw/ matches 'no' in "noonday", /wyb/ matches 'ly' in "possibly yesterday."
Character B
Meaning: Match a word's non-dividing line
For example: /wBn/ matches 'on' in "noonday", /yBw/ matches 'ye' in "possibly yesterday."
Character cX
Meaning: The X here is a control character. Match a string of control characters.
For example: /cM/ matches control-M in a string.
Character d
Meaning: Matching a number is equivalent to [0-9].
For example: /d/ or /[0-9]/ matches '2' in "B2 is the suite number."
Character D
Meaning: Match any non-digit, equivalent to [^0-9].
For example: /D/ or /[^0-9]/ matches 'B' in "B2 is the suite number."
Character f
Meaning: Match a form character
Character n
Meaning: Match a newline
Character r
Meaning: Match a carriage return
Characters
Meaning: Match a single white space character, including space, tab, form feed, line feed, equivalent to [fnrtv].
For example: /sw*/ matches 'bar' in "foo bar."
Character S
Meaning: Match a single character other than the white space character, equivalent to [^ fnrtv].
For example: /S/w* matches 'foo' in "foo bar."
Character t
Meaning: Match a tab character
Character v
Meaning: Match a header tab
Character w
Meaning: Match all numbers and letters and underscores, equivalent to [A-Za-z0-9_].
For example: /w/ matches 'a' in "apple," , '5' in "$5.28," and '3' in "3D."
Character W
Meaning: Matching other characters except numbers, letters and underscores is equivalent to [^A-Za-z0-9_].
For example: /W/ or /[^$A-Za-z0-9_]/match '%' in "50%.".
Character n
Meaning: n here is a positive integer. Matches the value of n of the last substring of a regular expression (counting the left bracket).
For example: /apple(,)sorange1/ matches 'apple, orange' in "apple, orange, cherry, peach.", below
There is a more complete example.
Note: If the number in the left parentheses is smaller than the number specified by n, n takes the octal escape of the next row as the description.
Characters ooctal and xhex
Meaning: Here ooctal is an octal escape value, while xhex is a hex escape value, allowing ASCII code to be embedded in a regular expression.
General mode
Delimiter, usually "/" is used as the beginning and end of the delimiter, and "#" can also be used.
When to use "#"? Generally, when there are many "/" characters in your string, because this kind of character needs to be escaped during regularity, such as uri.
The code using the "/" delimiter is as follows.
<?php $regex = '/^http://([w.]+)/([w]+)/([w]+).html$/i'; $str = '/show_page/id_ABCDEFG.html'; $matches = array(); if(preg_match($regex, $str, $matches)){ var_dump($matches); } echo "n";
$matches[0] in preg_match will contain the string matching the entire pattern.
The code using the "#" delimiter is as follows. At this time, "/" will not be escaped!
$regex = '#^http://([w.]+)/([w]+)/([w]+).html$#i'; $str = '/show_page/id_ABCDEFG.html'; $matches = array(); if(preg_match($regex, $str, $matches)){ var_dump($matches); } echo "n";
Modifier: Used to change the behavior of regular expressions.
The last "i" we see in ('/^http://([w.]+)/([w]+)/([w]+).html/i') is the modifier, which means ignoring case, and another thing we often use is "x" to ignore spaces.
Contribution code:
$regex = '/HELLO/'; $str = 'hello word'; $matches = array(); if(preg_match($regex, $str, $matches)){ echo 'No i:Valid Successful!',"n"; } if(preg_match($regex.'i', $str, $matches)){ echo 'YES i:Valid Successful!',"n"; }
Character field: [w] The part that is extended with square brackets is the character field.
Qualifiers: For example, [w]{3,5}, [w]* or [w]+, the symbols after [w] represent qualifiers. Now introduce the specific significance.
{3,5} means 3 to 5 characters. {3,} has more than 3 characters, {,5} has up to 5 characters, {3} has three characters.
* means 0 to more than
+ means 1 to more than one.
De-character symbol
^:
> Put it in the character field (such as: [^w]) to indicate negative (excluding meaning) - "Reverse selection"
> Put it before the expression, indicating that it starts with the current character. (/^n/i, means starting with n).
Note that we often call "bounce characters". Used to escape some special symbols such as ".","/"
The form of a regular expression is generally as follows:
/love/
The part between the "/" delimiter is the pattern that will be matched in the target object.
Metacharacter: refers to those special characters with special meaning in regular expressions, which can be used to specify the appearance pattern of their leading characters (that is, characters that are before the metacharacter) in the target object.
The more commonly used metacharacters include: "+", "*", and "?".
The "+" metacharacter specifies that its leading character must appear once or more consecutively in the target object
The "*" metacharacter specifies that its leading character must appear in the target object zero or multiple consecutive times.
The "?" metacharacter specifies that its leading character must appear zero or once in a row in the target object.
Next, let's take a look at the specific application of regular expression metacharacters.
/fo+/
Because the above regular expression contains the "+" metacharacter (the "o" before it is the leading character), it means that it can match the string in which one or more letters o appear successively after the letter f.
In addition to metacharacters, users can also specify exactly how often the pattern appears in the matching object. For example,
/jim{2,6}/
The above regular expression stipulates that the character m can appear 2-6 times in the matching object, so the above regular expression can be matched with strings such as jimmy or jimmmmy.
Several other important metacharacters are used.
s: used to match a single space character, including tab key and line break;
S: Used to match all characters except a single space character;
d: used to match numbers from 0 to 9;
w: used to match letters, numbers or underscore characters;
W: Used to match all characters that do not match w;
.: Used to match all characters except line breaks.
(Note: We can regard s and S and w and W as inverse operations)
Next, we will take an example to see how to use the above metacharacter in regular expressions.
/s+/
The above regular expressions can be used to match one or more space characters in the target object.
In addition to the metacharacters we have introduced above, regular expressions also have another unique special character, namely, locators.
Locator: Used to specify the location where the matching pattern appears in the target object.
More commonly used locators include: "^", "$", "b" and "B".
"^" locator specifies that the matching pattern must appear at the beginning of the target string
The "$" locator specifies that the matching pattern must appear at the end of the target object
b locator specifies that the matching pattern must appear at one of the two boundaries at the beginning or end of the target string
The "B" locator specifies that the matching object must be within the two boundaries of the beginning and end of the target string, that is, the matching object cannot be used as the beginning of the target string or as the end of the target string. Similarly, we
"^" and "$" and "b" and "B" can also be regarded as two sets of locators that are inverse operations. For example:
/^hell/
Because the above regular expression contains the "^" locator, it can be matched with a string starting with "hell", "hello" or "hellhound" in the target object.
/ar$/
Because the above regular expression contains the "$" locator, it can be matched with a string ending in "car", "bar" or "ar" in the target object.
/bbom/
Because the above regular expression pattern begins with the "b" locator, it can be matched with a string starting with "bomb", or "bom" in the target object.
/manb/
Because the above regular expression pattern ends with the "b" locator, it can be matched with a string ending with "human", "woman" or "man" in the target object.
In order to facilitate users to set matching patterns more flexibly, regular expressions allow users to specify a certain range in the matching patterns without being limited to specific characters. For example:
/[A-Z]/
The above regular expression will match any capital letter in the range A to Z.
/[a-z]/
The above regular expression will match any lowercase letter in the range a to z.
/[0-9]/
The above regular expression will match any number in the range from 0 to 9.
/([a-z][A-Z][0-9])+/
The above regular expression will match any string composed of letters and numbers, such as "aB0", etc. One thing that needs to be reminded here is that you can use "()" in regular expressions to combine strings together.
"()" symbol: The content must appear in the target object at the same time. Therefore, the above regular expression will not be able to match a string such as "abc" and so on, because the last character in "abc" is a letter rather than a number.
If we want to implement "or" operations similar to programming logic in regular expressions and choose one of multiple different patterns to match, we can use the pipe character: "|". For example:
/to|too|2/
The above regular expression will match "to", "too", or "2" in the target object.
Negative: "[^]". Unlike the locator "^" we introduced earlier, the negative character "[^]" stipulates that the string specified in the pattern cannot exist in the target object. For example:
/[^A-C]/
The above string will match any character in the target object except A, B, and C. Generally speaking, when "^" appears in "[]", it is regarded as a negative operator; when "^" is outside "[]", or there is no "[]", it should be regarded as a locator.
Finally, when the user needs to add metacharacters to the pattern of the regular expression and find their matching object, you can use
Escape character: "". For example:
/Th*/
The above regular expression will match "Th*" instead of "The" and so on in the target object.
Introduction to practical experience
We still have to say that ^ and $ are used to match the beginning and end of a string respectively. The following examples are given:
"^The": There must be a "The" string at the beginning;
"of despair$": There must be a string of "of despair" at the end;
So,
"^abc$": It requires strings that start with abc and end with abc, but in fact only abc matches;
"Notice": Match the string containing the notice;
You can see that if you don't use the two characters we mentioned (the last example), that is, the pattern (regular expression) can appear anywhere in the string being tested, and you don't lock it to both sides.
Next, talk about ‘*’ ‘+’ and ‘?’
They are used to represent the number or order in which a character can appear, and they represent:
"zero or more" is equivalent to {0,}
"one or more" is equivalent to {1,}
"zero or one." is equivalent to {0,1}
Here are some examples:
"ab*": synonymous with ab{0,}, matches a string starting with a and can be followed by 0 or N b ("a", "ab", "abbb", etc.);
"ab+": synonymous with ab{1,}, the same as the previous article, but at least one b must exist ("ab" "abbb" and so on);
"ab?": synonymous with ab{0,1}, and may have no or only one b;
"a?b+$": Match a string ending with one or 0 a plus more than one b.
Key points: '*' '+' and '?' only the character before it.
You can also limit the number of characters in braces, such as:
"ab{2}": requires that two bs must be followed by a (either one cannot be missing) ("abb");
"ab{2,}": It is required that there must be two or more b after a (such as "abb" "abbbb", etc.);
"ab{3,5}": It is required that there can be 2-5 b("abbb", "abbbb", or "abbbbb") after a.
Now we put a few characters in brackets, such as:
"a(bc)*": Match a followed by 0 or one "bc";
"a(bc){1,5}": one to 5 "bc";
There is also a character '|', which is equivalent to an OR operation:
"hi|hello": matches a string containing "hi" or "hello";
"(b|cd)ef": Match a string containing "bef" or "cdef";
"(a|b)*c": Match a string containing such multiple (including 0) a or b, followed by a c;
A dot ('.') can represent all single characters, excluding ""
What if, to match all single characters including ""?
Use the '[.]' mode.
"a.[0-9]": add a character to a and then a number from 0 to 9;
"^.{3}$": ending with three arbitrary characters.
The content enclosed in brackets only matches a single character
“[ab]”: matches a or b (same as “a│b”);
"[a-d]": single characters matching 'a' to 'd' (and "a│b│c│d" and "[abcd]" have the same effect);
Generally, we use [a-zA-Z] to specify that the character is in upper and lower case English:
"^[a-zA-Z]": Match a string starting with upper and lower case letters;
"[0-9]%": Match a string containing a shape of x%;
",[a-zA-Z0-9]$": Match a string ending with a comma plus a number or letter;
You can also list the characters you don't want in brackets. You just need to use '^' in the brackets as the beginning "%[^a-zA-Z]%" to match a string containing two percent signs with a non-letter inside.
Key points: ^ When used at the beginning of brackets, it means excluding characters in brackets.
For PHP to be able to interpret, you must add "before and after these characters" and escape some characters.
Don't forget that the characters in brackets are exceptions to this rule - in brackets, all special characters, including ("), will lose their special properties "[*+?{}.]" match strings containing these characters:
Also, as the manual of regx tells us: "If the list contains ']', it is best to use it as the first character in the list (maybe after '^'). If the list contains '-', it is best to put it in the front or the last one
, or , or the '-' in the middle of the second end point of a range [a-d-0-9] will be valid.
After looking at the example above, you should understand {n,m}. It should be noted that neither n nor m can be negative integers, and n is always less than m. In this way, at least n matches can be matched and at most m matches. If "p{1,5}" will match
The first five ps in "pvpppppppp"
Let's talk about starting with
b The book says that it is used to match a word boundary, that is... For example, 'veb', it can match the ve in love but not very ve in very
B is exactly the opposite of b above.
Other usages of regular expressions
Extract strings
ereg() and eregi() have a feature that allows users to extract part of a string through regular expressions (you can read the manual for specific usage). For example, we want to extract the file name from the path/URL, the following generation
The code is what you need:
ereg(”([^\/]*)$”, $pathOrUrl, $regs);
echo $regs[1];
Advanced replacement
Ereg_replace() and eregi_replace() are also very useful, if we want to replace all spacing signs with commas:
ereg_replace(”[ t]+”, “,”, trim($str));
The following is the quoted content:
preg_match() and preg_match_all()
preg_quote()
preg_split()
preg_grep()
preg_replace()
We can find the specific use of functions through the PHP manual. Here are some regular expressions accumulated in normal times:
Match action attributes
The following is the quoted content:
$str = '';
$match = '';
preg_match_all('/s+action="(?!http:)(.*?)"s/', $str, $match);
print_r($match);
Use callback functions in regular
The following is the quoted content:
/** * replace some string by callback function * */ function callback_replace() { $url = ''; $str = ''; $str = preg_replace ( '/(?<=saction=")(?!http:)(.*?)(?="s)/e', 'search($url, \1)', $str ); echo $str; } function search($url, $match){ return $url . '/' . $match; }
Regular match with assertions
$match = ''; $str = ' bold font paragraph text '; preg_match_all ( '/(?<=<(w{1})>).*(?=</1>)/', $str, $match );
echo "Match content in HTML tags without attributes:";
print_r ( $match );
Replace the address in the HTML source code
The following is the quoted content:
$form_html = preg_replace ( '/(?<=saction="|ssrc="|shref=")(?!http:|javascript)(.*?)(?="s)/e', 'add_url($url, '\1')', $form_html );
Metacharacter
In the above example, symbols such as ^, d and $ represent specific matching meanings. We call them metacharacters. Common metacharacters are as follows:
Metacharacter Description
. Match any character except line break
w match letters or numbers or underscores
s matches any blank characters
d Match number
b Start or end of matching words
^ Match the beginning of the string
$ match the end of the string
[x] Match x characters, such as matching a, b, and c characters in a string
The antonym of W w, that is, matching characters that are non-letters, numbers, underscores and Chinese characters.
The antonym of S s, that is, matching any non-whitespace characters
The antonym of D d, that is, matching any non-numeric characters
The antonym of B b is not the beginning or end of a word
[^x] matches any character except x, such as [^abc] matches any character except abc