Python uses regular matching to determine that strings contain certain specific substrings and regular expressions. Detailed explanation of the string

1. Determine whether the string contains a string

1. in，not in

There are many ways to determine whether there are certain keywords in the string.
For example, after word segmentation, word vectors and keywords are used to perform word segmentation==Match, but the accuracy of word segmentation since this method is not recommended;
Secondly, use member operatorsin,not inIt is possible to better judge whether a string contains a certain keyword, that is, a specific string

a = 'This summer vacation I read Dream of Red Mansions and Romance of the Three Kingdoms'
b= ['Romance of the Three Kingdoms','Water Margin','Journey to the West','Dream of Red Mansions']
n = 0
for i in b:
    if i in a:
        n += 1
print(f'Read the four great classics during the summer vacation{n}Book')

Although this traversal algorithm can successfully obtain the desired results, when the data volume is large, the program execution efficiency will be very low. As a professional search tool, regular matching can greatly improve work efficiency in judging that strings contain specific strings.

2. Regular Match

import re
def is_in(fullstr,substr):
    if (substr,fullstr):
        return 1
    else:
        return 0
a = 'This summer vacation I read Dream of Red Mansions and Romance of the Three Kingdoms'
b= ['Romance of the Three Kingdoms','Water Margin','Journey to the West','Dream of Red Mansions']
n = 0
for i in b:
    n = is_in(a,i)
    n += 1
print(f'Read the four great classics during the summer vacation{n}Book')

findall: Return all strings in string that match pattern, return form as an array

(pattern, string, flags=0)

Examples are as follows:

line = []
n = 0
for i in b:
    num = is_in(a,i)
    n += num
    res = (i,a)
    line = line + res 
print(f'Read the four great classics during the summer vacation{n}Book')
print(f'They are{line}')
'''
res = (i,a)      The return value is a list
out:
Read the four great classics during the summer vacation2Book
They are['Romance of the Three Kingdoms', 'Dream of Red Mansions']
'''

The functions of regularity are very powerful, and the above uses only one of the small functions. Let's continue to learn the powerful functions of the regular

2. Regular expressions

(I) Basic content

1. Regular expression modifier—optional flag

Regular expressions can contain some optional flag modifiers to control the matching pattern. The modifier is specified as an optional flag. Multiple flags can be specified by bitwise OR(|) them. As | is set to I and M logos

Modifier	Function	Full name
	Match ignores upper and lower case
	Do locale-aware matching, which means special character set`\w`, `\W`, `\b`, `\B`, `\s`, `\S`Depend on the current environment (This mark is not recommended for use)
	Multiple row matching, impact `^`and `$`(In regular expression`^`Indicates the beginning of a matching line. In default mode, it can only match the beginning of a string; in multi-line mode, it can also match the newline character`\n`The following characters.NOTE:In regular syntax`^`Match the beginning of the line,`\A`Match the beginning of a string, the two effects are the same in single-line mode, and in multi-line mode`\`A cannot be recognized`\n`）
	make·`.` Match all characters including newlines (DOT means., ALL means all, and it is connected`.`Match all, including newlines`\n`. In default mode`.`It cannot match line characters`\n`of)
	Represents a special character set \w, \W, \b, \B, \d, \D, \s, \S Rely on the Unicode character attribute database (similar to the ASCII pattern, matching characters supported by unicode encoding, butPython 3 default string is already Unicode, so a littleredundancy)
	Increase readability, ignore spaces and comments after # (the comments in regular expressions cannot be recognized in default mode, while detailed mode can be recognized)
	Let \w, \W, \b, \B, \d, \D, \s and \S match only ASCII, not Unicode
	Show compile-time debug information

2. Regular expression pattern

Pattern strings use special syntax to represent a regular expression:
(1) Letters and numbers represent themselves. A regular expression pattern matches the same string.
(2) When adding a backslash to a multiple number and a number, it has different meanings, for example\n Indicates a new line.
(3) Punctuation marks only match themselves when they are escaped, otherwise they represent special meanings.
(4) The backslash itself requires backslash escape.
(5) Since regular expressions usually contain backslashes, it is best to use raw strings to represent them. Pattern elements (such asr'\t', equivalent to'\\t') matches the corresponding special characters.

The following table lists special elements in the regular expression pattern syntax. If an optional flag parameter is provided while using the pattern, the meaning of some pattern elements will change.

model	Function	Example	Matching strings
`^`	Match the beginning of the string
`$`	Match the end of the string
`.`	Match any character except line breaks`\n`
`[…]`	Indicates that a set of characters are listed separately	`[like]`	`'l'`,`'i'`,`'k' 'e'`
`[^…]`	Characters that are not in []	`[^like]`	Apart from`'l'`,`'i'`,`'k'` ,`e`Characters other than
`\w`	Match alphanumeric and underscore	`a-z`、`A-Z`、`0-9`、`_`
`\W`	Match non-alphanumeric and underscore
`\s`	Match any whitespace character, equivalent to `\t` `\n` `\r` `\f`
`\S`	Match any non-null characters
`\d`	Match any number, equivalent to [0-9]
`\D`	Match any non-number
`\A`	Match string start
`\z`	Match string end
`\Z`	Match the end of the string. If there is a line break, it will only match the end string before the line break.
`\G`	Match the final match completed position
`\b`	Match a word boundary, which means the position of the word and space	`er\b`	`never` (√)，`verb`(×)
`\B`	Match non-word boundaries	`er\b`	`never` (×)，`verb`(√)
`\n` `\t`wait	Match a newline character. Match a tab character. wait
`\1` …`\9`	Match the content of the nth group
`\10`	Match the content of the nth group if it matches. Otherwise, it refers to the expression of the octal character code
`re*`	(`*`Greedy) Match 0 or more expressions (the previous character appears 0 times or infinite times, so it can be ignored)	`abc*`	`abccc`
`re+`	(`+`Lazy) Match 1 or more expressions (the previous character appears once or infinitely, that is, at least 1 time)	`abc+`	`abc` `abcccc`
`re?`	(`？`Occupancy) Match 0 or 1 fragment defined by the previous regular expression, non-greedy (the previous character appears 1 or 0 times, that is, there is either 1 or no)	`abc?`	`abc` `ab`
`re{n}`	The previous character appears n times	`o{2}`	`food`
`{m,n}`	Match the previous character from m to n times. If m is omitted, match 0 to n times. If n is omitted, match m to infinite times.	`ab{1,2}c`	`abc` `abbc`
`a \|b`	Match a or b
`(re)`	Group regular expressions and remember matching text
`\num`	Refer to the string matched by the group num
`(?P<name>)`	Aliased group, the matching substring group is obtained externally through the defined name
`(?P=name)`	Refer to the string matched by the alias name group

(II) Common expression functions

Regular expressions are a special character sequence that can easily check whether a string matches a certain pattern. Python comes withreModules allow Python language to have all regular expression functions
Commonly used regular expression functions in python are as follows:

Functional classification	function			Function
Find a match				Find matches from anywhere
				Must match from the beginning of the string
				The entire string exactly matches the regular
Find multiple matches				Find from anywhere in the string and return a list
Find multiple matches				Search from anywhere in the string and return an iterator
segmentation				Use regular expression to split a string into multiple segments
replace				Replace the characters in a string that are matched by regular expressions, and return the replaced string. The replacement can be a string or a function
replace				Replace the characters in a string that are matched by regular expressions, and return the replaced string and the number of replacements
Compile regular objects				Compile the style of a regular expression into a regular expression object (regular object pattern)
Compile regular objects				Compile the style of a regular expression into a regular expression object and add a pattern
other				You can escape characters with special meanings in regular expressions, such as: . or *
other				Clear regular expression cache

Try to match a pattern from the start position of the string. If the start position is not successful, match() will return none

(pattern, string, flags=0)

parameter	illustrate
pattern	Match regular expressions
string	String to match
flags	Flag bits are used to control the matching method of regular expressions, such as whether they are case sensitive, multi-line matching, etc. See Table 1 Modifiers

Match successfully The method returns a matching object, otherwise it returns None
Availablegroup(num)orgroups() Match object function to get matching expressions

Match object method	describe
group(num=0)	A string matching the entire expression, group() can enter multiple group numbers at once, in which case it will return a tuple containing the values corresponding to those groups.
groups()	Returns a tuple containing all group strings, from 1 to the group number contained in

0: Represents a string that meets the criteria in a regular expression.
1: Represents the string in the first() of the string that meets the criteria in the regular expression.
2: Represents the string in the second() of the string that meets the criteria in the regular expression.
And so on…

import re
fullstr = 'name:alice,result:89'
result = ('name:(\w+),result:(\d+)', fullstr)
print(result)
print((0))
print((1))
print((2))
print(())

result:

out1: < object; span=(0, 20), match=‘name:alice,result:89’>
out2: name:alice,result:89
out3: alice
out4: 89
out5: name:alice,result:89

From the results, we can see that the () method returns a matching object, not the matching content. The matching result can be obtained by calling span(). If no match is successful from the start position, even if the other part contains content that needs to be matched, () will return None
Availablegroup()To extract the strings matched to each group.
group() returns a tuple containing all group strings, from 0 to the group number contained in

Notice: If no element is matched during the process of using regular expressions to match, and then group() is called, an error will be reported:AttributeError: 'NoneType' object has no attribute 'group'
If such an error occurs, you can change the match to search() to avoid this kind of problem. The search function first scans all code blocks and then extracts them.

Will match the entire string and return the first successful match. If the match fails, return None

(pattern, string, flags=0)

The same parameters

Example:

import re
fullstr = 'class: Class 1, name:alice, result:89'
result = ('name:(\w+),result:(\d+)', fullstr)
print(result)
print((0))

out1: None
out2: AttributeError: ‘NoneType’ object has no attribute ‘group’
Reason: match match at the start position. If the match is not successful in the start position, match() will return none

trysearch

import re
fullstr = 'class: Class 1, name:alice, result:89'
result = ('name:(\w+),result:(\d+)', fullstr)
print(result)
print((0))
print((1))
print((2))
print(())

out１：　< object; span=(9, 29), match=‘name:alice,result:89’>
out２：　name:alice,result:89
out３：　alice
out４：　89
out５：　name:alice,result:89

This function is mainly used to replace matches in strings

(pattern, repl, string, count=0, flags=0)

Examples are as follows:

parameter	illustrate
pattern	Required parameters: Pattern string in regular
repl	Required parameters: Replaced string, can also be a function
string	Required parameters, the original string to be replaced by search
count	Optional parameters, the maximum number of replacements after pattern matching, default 0 means to replace all matches
flags	Optional parameters represent the matching mode used during compilation (such as ignoring case, multi-line mode, etc.), and the default is 0.

#Modify the scorefullstr = 'name:alice,result:89'
res1 = (r'\d+','90',fullstr)
print(res1)

out： name:alice,result:90

replCan be a function. as follows:

#Modify the scoredef change(matched):
    value = int(('value'))
    return str(value + 1)
fullstr = 'name:alice,result:89'
res1 = ('(?P&lt;value&gt;\d+)',change,fullstr)
print(res1)

out： name:alice,result:90

The compile function is used to compile regular expressions and generate a regular expression ( Pattern ) object for use by match() and search() functions.
The general steps for using the re module are:

1. Use the compile function to compile the string form of a regular expression into a Pattern object
2. Use a series of methods provided by the Pattern object to match and find the matching result (a Match object)
3. Finally, use the properties and methods provided by the Match object to obtain information, and perform other operations as needed.

(pattern, flags)

compileThe returned is a matching object. It has no meaning when used alone. It needs to be used with findall(), search(), and match().

Find all substrings that the regular expression matches in the string and return a list, if there are multiple matching patterns, return a list of tuples, and return an empty list if no matching is found
Note： match and search are matched once findall matches all

(pattern, string[, maxsplit=0, flags=0])

This is the article about python using regular matching to determine whether strings contain certain substrings and regular expressions in detailed explanations. For more related python regular expressions to determine whether string content is contained in strings, please search for my previous articles or continue browsing the following related articles. I hope everyone will support me in the future!