SoFunction
Updated on 2024-10-29

Python Regular Expressions in the right place article

I. Definitions

Regular expression is a logical formula for string manipulation, that is, with some specific characters defined in advance, and the combination of these specific characters, to form a "regular string", this "regular string" is used to express the string of a filtering logic. If we find a string that meets such a rule, we say match, otherwise the match fails.

II. Matching rules

1. Grammatical rules

 

2. Related notes

a. Backslash issues

If you need to match the character "\" in the text, then the regular expression in the programming language will require four backslashes "\\\\": the first two and the last two are used to escape into backslashes in programming languages, converted into two backslashes and then escaped into one backslash in regular expressions. The matching process is as follows:

character matching process
\\\\abc Unescaping backslashes for string real values
\\abc Unescaping backslashes for ()
\abc The target string to be matched

To get around the trouble of typing four "\"s, we can use a raw string in python, i.e., prefix the string with an r. This is as follows:

import re 
 print((r"\\abc","123\\abc"))

As you can see from the above, using native strings eliminates the string escaping process from the string real value to the re compiler, which still has to escape it when compiling.

b. Greedy vs. non-greedy matching

Greedy Matching: Regular expressions generally tend to match at maximum length, also known as greedy matching. For example:

 import re
print(("ab.*c","abcdfghc"))

The result of a match is the entire string. A non-greedy match is one that matches the result just fine, matching the least number of characters. python defaults to greedy mode; a question mark directly after a quantifier? is non-greedy mode.

 import re
 print(("ab.*?c","abcdfghc"))

The result of this match is "abc".

III. Modules and functions

re module

Compile() compilation syntax rules

match() matches from the beginning of the string.

search() matches from any position in the string to the first string that matches the rule.

findall returns all matched strings as a list.

finditer returns all matched strings as an iterator.

split() Split String

group() Get the grouping information of the matched strings.

IV. Rules of special construction

 

summarize

The above is a small introduction to the python regular expression of the right chapter, I hope to help you, if you have any questions please leave me a message, I will reply to you in time. Here also thank you very much for your support of my website!