1. Determine whether the string contains a string
1. in
,not in
There are many ways to determine whether there are certain keywords in the string.
For example, after word segmentation, word vectors and keywords are used to perform word segmentation==
Match, but the accuracy of word segmentation since this method is not recommended;
Secondly, use member operatorsin
,not in
It is possible to better judge whether a string contains a certain keyword, that is, a specific string
a = 'This summer vacation I read Dream of Red Mansions and Romance of the Three Kingdoms' b= ['Romance of the Three Kingdoms','Water Margin','Journey to the West','Dream of Red Mansions'] n = 0 for i in b: if i in a: n += 1 print(f'Read the four great classics during the summer vacation{n}Book')
Although this traversal algorithm can successfully obtain the desired results, when the data volume is large, the program execution efficiency will be very low. As a professional search tool, regular matching can greatly improve work efficiency in judging that strings contain specific strings.
2. Regular Match
import re def is_in(fullstr,substr): if (substr,fullstr): return 1 else: return 0 a = 'This summer vacation I read Dream of Red Mansions and Romance of the Three Kingdoms' b= ['Romance of the Three Kingdoms','Water Margin','Journey to the West','Dream of Red Mansions'] n = 0 for i in b: n = is_in(a,i) n += 1 print(f'Read the four great classics during the summer vacation{n}Book')
findall
: Return all strings in string that match pattern, return form as an array
(pattern, string, flags=0)
Examples are as follows:
line = [] n = 0 for i in b: num = is_in(a,i) n += num res = (i,a) line = line + res print(f'Read the four great classics during the summer vacation{n}Book') print(f'They are{line}') ''' res = (i,a) The return value is a list out: Read the four great classics during the summer vacation2Book They are['Romance of the Three Kingdoms', 'Dream of Red Mansions'] '''
The functions of regularity are very powerful, and the above uses only one of the small functions. Let's continue to learn the powerful functions of the regular
2. Regular expressions
(I) Basic content
1. Regular expression modifier—optional flag
Regular expressions can contain some optional flag modifiers to control the matching pattern. The modifier is specified as an optional flag. Multiple flags can be specified by bitwise OR(|) them. As | is set to I and M logos
Modifier | Function | Full name |
---|---|---|
Match ignores upper and lower case | ||
Do locale-aware matching, which means special character set\w , \W , \b , \B , \s , \S Depend on the current environment (This mark is not recommended for use) |
||
Multiple row matching, impact ^ and $ (In regular expression^ Indicates the beginning of a matching line. In default mode, it can only match the beginning of a string; in multi-line mode, it can also match the newline character\n The following characters.NOTE:In regular syntax^ Match the beginning of the line,\A Match the beginning of a string, the two effects are the same in single-line mode, and in multi-line mode\ A cannot be recognized\n ) |
||
make·. Match all characters including newlines (DOT means., ALL means all, and it is connected. Match all, including newlines\n . In default mode. It cannot match line characters\n of) |
||
Represents a special character set \w, \W, \b, \B, \d, \D, \s, \S Rely on the Unicode character attribute database (similar to the ASCII pattern, matching characters supported by unicode encoding, butPython 3 default string is already Unicode, so a littleredundancy) | ||
Increase readability, ignore spaces and comments after # (the comments in regular expressions cannot be recognized in default mode, while detailed mode can be recognized) | ||
Let \w, \W, \b, \B, \d, \D, \s and \S match only ASCII, not Unicode | ||
Show compile-time debug information |
2. Regular expression pattern
Pattern strings use special syntax to represent a regular expression:
(1) Letters and numbers represent themselves. A regular expression pattern matches the same string.
(2) When adding a backslash to a multiple number and a number, it has different meanings, for example\n
Indicates a new line.
(3) Punctuation marks only match themselves when they are escaped, otherwise they represent special meanings.
(4) The backslash itself requires backslash escape.
(5) Since regular expressions usually contain backslashes, it is best to use raw strings to represent them. Pattern elements (such asr'\t'
, equivalent to'\\t'
) matches the corresponding special characters.
The following table lists special elements in the regular expression pattern syntax. If an optional flag parameter is provided while using the pattern, the meaning of some pattern elements will change.
model | Function | Example | Matching strings |
---|---|---|---|
^ |
Match the beginning of the string | ||
$ |
Match the end of the string | ||
. |
Match any character except line breaks\n
|
||
[…] |
Indicates that a set of characters are listed separately | [like] |
'l' ,'i' ,'k' 'e'
|
[^…] |
Characters that are not in [] | [^like] |
Apart from'l' ,'i' ,'k' ,e Characters other than |
\w |
Match alphanumeric and underscore |
a-z 、A-Z 、0-9 、_
|
|
\W |
Match non-alphanumeric and underscore | ||
\s |
Match any whitespace character, equivalent to \t \n \r \f
|
||
\S |
Match any non-null characters | ||
\d |
Match any number, equivalent to [0-9] | ||
\D |
Match any non-number | ||
\A |
Match string start | ||
\z |
Match string end | ||
\Z |
Match the end of the string. If there is a line break, it will only match the end string before the line break. | ||
\G |
Match the final match completed position | ||
\b |
Match a word boundary, which means the position of the word and space | er\b |
never (√),verb (×) |
\B |
Match non-word boundaries | er\b |
never (×),verb (√) |
\n \t wait |
Match a newline character. Match a tab character. wait | ||
\1 …\9
|
Match the content of the nth group | ||
\10 |
Match the content of the nth group if it matches. Otherwise, it refers to the expression of the octal character code | ||
re* |
(* Greedy) Match 0 or more expressions (the previous character appears 0 times or infinite times, so it can be ignored) |
abc* |
abccc |
re+ |
(+ Lazy) Match 1 or more expressions (the previous character appears once or infinitely, that is, at least 1 time) |
abc+ |
abc abcccc
|
re? |
(? Occupancy) Match 0 or 1 fragment defined by the previous regular expression, non-greedy (the previous character appears 1 or 0 times, that is, there is either 1 or no) |
abc? |
abc ab
|
re{n} |
The previous character appears n times | o{2} |
food |
{m,n} |
Match the previous character from m to n times. If m is omitted, match 0 to n times. If n is omitted, match m to infinite times. | ab{1,2}c |
abc abbc
|
a |b |
Match a or b | ||
(re) |
Group regular expressions and remember matching text | ||
\num |
Refer to the string matched by the group num | ||
(?P<name>) |
Aliased group, the matching substring group is obtained externally through the defined name | ||
(?P=name) |
Refer to the string matched by the alias name group |
(II) Common expression functions
Regular expressions are a special character sequence that can easily check whether a string matches a certain pattern. Python comes withre
Modules allow Python language to have all regular expression functions
Commonly used regular expression functions in python are as follows:
Functional classification | function | Function | ||||
---|---|---|---|---|---|---|
Find a match |
Find matches from anywhere |
|||||
Must match from the beginning of the string |
||||||
The entire string exactly matches the regular |
||||||
Find multiple matches |
Find from anywhere in the string and return a list |
|||||
Search from anywhere in the string and return an iterator |
||||||
segmentation |
Use regular expression to split a string into multiple segments |
|||||
replace |
Replace the characters in a string that are matched by regular expressions, and return the replaced string. The replacement can be a string or a function |
|||||
Replace the characters in a string that are matched by regular expressions, and return the replaced string and the number of replacements |
||||||
Compile regular objects |
Compile the style of a regular expression into a regular expression object (regular object pattern) |
|||||
Compile the style of a regular expression into a regular expression object and add a pattern |
||||||
other |
You can escape characters with special meanings in regular expressions, such as: . or * |
|||||
Clear regular expression cache |
1.
Try to match a pattern from the start position of the string. If the start position is not successful, match() will return none
(pattern, string, flags=0)
parameter | illustrate |
---|---|
pattern | Match regular expressions |
string | String to match |
flags | Flag bits are used to control the matching method of regular expressions, such as whether they are case sensitive, multi-line matching, etc. See Table 1 Modifiers |
Match successfully
The method returns a matching object, otherwise it returns None
Availablegroup(num)
orgroups()
Match object function to get matching expressions
Match object method | describe |
---|---|
group(num=0) | A string matching the entire expression, group() can enter multiple group numbers at once, in which case it will return a tuple containing the values corresponding to those groups. |
groups() | Returns a tuple containing all group strings, from 1 to the group number contained in |
0: Represents a string that meets the criteria in a regular expression.
1: Represents the string in the first() of the string that meets the criteria in the regular expression.
2: Represents the string in the second() of the string that meets the criteria in the regular expression.
And so on…
import re fullstr = 'name:alice,result:89' result = ('name:(\w+),result:(\d+)', fullstr) print(result) print((0)) print((1)) print((2)) print(())
result:
out1: < object; span=(0, 20), match=‘name:alice,result:89’>
out2: name:alice,result:89
out3: alice
out4: 89
out5: name:alice,result:89
From the results, we can see that the () method returns a matching object, not the matching content. The matching result can be obtained by calling span(). If no match is successful from the start position, even if the other part contains content that needs to be matched, () will return None
Availablegroup()
To extract the strings matched to each group.
group() returns a tuple containing all group strings, from 0 to the group number contained in
Notice: If no element is matched during the process of using regular expressions to match, and then group() is called, an error will be reported:AttributeError: 'NoneType' object has no attribute 'group'
If such an error occurs, you can change the match to search() to avoid this kind of problem. The search function first scans all code blocks and then extracts them.
2.
Will match the entire string and return the first successful match. If the match fails, return None
(pattern, string, flags=0)
The same parameters
Example:
import re fullstr = 'class: Class 1, name:alice, result:89' result = ('name:(\w+),result:(\d+)', fullstr) print(result) print((0))
out1: None
out2: AttributeError: ‘NoneType’ object has no attribute ‘group’
Reason: match match at the start position. If the match is not successful in the start position, match() will return none
trysearch
import re fullstr = 'class: Class 1, name:alice, result:89' result = ('name:(\w+),result:(\d+)', fullstr) print(result) print((0)) print((1)) print((2)) print(())
out1: < object; span=(9, 29), match=‘name:alice,result:89’>
out2: name:alice,result:89
out3: alice
out4: 89
out5: name:alice,result:89
3.
This function is mainly used to replace matches in strings
(pattern, repl, string, count=0, flags=0)
Examples are as follows:
parameter | illustrate |
---|---|
pattern | Required parameters: Pattern string in regular |
repl | Required parameters: Replaced string, can also be a function |
string | Required parameters, the original string to be replaced by search |
count | Optional parameters, the maximum number of replacements after pattern matching, default 0 means to replace all matches |
flags | Optional parameters represent the matching mode used during compilation (such as ignoring case, multi-line mode, etc.), and the default is 0. |
#Modify the scorefullstr = 'name:alice,result:89' res1 = (r'\d+','90',fullstr) print(res1)
out: name:alice,result:90
repl
Can be a function. as follows:
#Modify the scoredef change(matched): value = int(('value')) return str(value + 1) fullstr = 'name:alice,result:89' res1 = ('(?P<value>\d+)',change,fullstr) print(res1)
out: name:alice,result:90
4.
The compile function is used to compile regular expressions and generate a regular expression ( Pattern ) object for use by match() and search() functions.
The general steps for using the re module are:
1. Use the compile function to compile the string form of a regular expression into a Pattern object
2. Use a series of methods provided by the Pattern object to match and find the matching result (a Match object)
3. Finally, use the properties and methods provided by the Match object to obtain information, and perform other operations as needed.
(pattern, flags)
compile
The returned is a matching object. It has no meaning when used alone. It needs to be used with findall(), search(), and match().
5.
Find all substrings that the regular expression matches in the string and return a list, if there are multiple matching patterns, return a list of tuples, and return an empty list if no matching is found
Note: match and search are matched once findall matches all
6.
(pattern, string[, maxsplit=0, flags=0])
This is the article about python using regular matching to determine whether strings contain certain substrings and regular expressions in detailed explanations. For more related python regular expressions to determine whether string content is contained in strings, please search for my previous articles or continue browsing the following related articles. I hope everyone will support me in the future!