SoFunction
Updated on 2024-12-19

Python commonly used regular expression processing functions in detail

A regular expression is a special sequence of characters used to succinctly express a set of string characteristics, check whether a string matches a certain pattern, and is very easy to use.

In Python, we use the re module by calling the re library:

import re

Regular expression syntax patterns and operators are detailed:/python/#flags

The following is an introduction to Python's commonly used regular expression processing functions.

function (math.)

The function matches the regular expression from the start of the string and returns the match object, if the match is not successful from the start, match() returns None.

(pattern, string, flags=0)

pattern: the regular expression to match.

string: the string to be matched.

flags: flags, used to control the regular expression matching mode, such as: whether case-sensitive, multi-line matching and so on. Specific parameters are:

: Ignore case.

: Indicates that the special character set \w, \W, \b, \B, \s, \S depends on the current environment.

: Multi-line mode.

: i.e. . , and any character including line breaks (. excluding line breaks).

: Indicates that the special character sets \w, \W, \b, \B, \d, \D, \s, \S depend on the Unicode Character Attribute Database.

: Ignore spaces and # followed by comments for increased readability.

import re
# Match from the start
r1=('abc','abcdefghi')
print(r1)
#Not matching from the start
r2=('def','abcdefghi')
print(r2)

Run results:

where span denotes the index of the entire substring that was matched successfully.

Use the group(num) or groups() match object function to get a matching expression.

group(num): the entire expression of the matching string, group() can be entered at a time more than one group number, it will return a tuple containing the values corresponding to those groups.

groups(): Returns a tuple containing all the group strings, from 1 to the included group number.

import re

s='This is a demo'
r1=(r'(.*) is (.*)',s)
r2=(r'(.*) is (.*?)',s)

print(())
print((1))
print((2))
print(())
print()
print(())
print((1))
print((2))
print(())

Run results:

The (. *) and (. *?) represent the greedy and non-greedy matching of regular expressions, see here for details:https:///article/

function (math.)

function scans the entire string and returns the first successful match, if the match is successful then return the match object, otherwise return None.

(pattern, string, flags=0)

pattern: the regular expression to match.

string: the string to be matched.

flags: flags, used to control the regular expression matching mode, such as: whether case-sensitive, multi-line matching and so on.

import re
# Match from the start
r1=('abc','abcdefghi')
print(r1)
#Not matching from the start position
r2=('def','abcdefghi')
print(r2)

Running results:

Use the group(num) or groups() match object function to get a matching expression.

group(num=0): the entire expression of the matching string, group() can be entered at a time more than one group number, it will return a tuple containing the corresponding value of those groups.

groups(): Returns a tuple containing all the group strings, from 1 to the included group number.

import re

s='This is a demo'
r1=(r'(.*) is (.*)',s)
r2=(r'(.*) is (.*?)',s)

print(())
print((1))
print((2))
print(())
print()
print(())
print((1))
print((2))
print(())

Running results:


From the above it is not difficult to find with the difference: only match the starting position of the string, as long as the starting position does not match the regular expression on the match failed, but to match the entire string, until a match is found.

function (math.)

The compile function is used to compile a regular expression and generate a regular expression object for use by the match() and search() functions.

(pattern[, flags])

pattern: a regular expression in string form.

flags: optional, indicate the match mode, such as ignore case, multi-line mode, etc..

import re
#Matching numbers
r=(r'\d+') 
r1=('This is a demo')
r2=('This is 111 and That is 222',0,27)
r3=('This is 111 and That is 222',8,27)
 
print(r1)
print(r2)
print(r3)

Run results:

findall function

Searches for strings and returns all substrings matched by the regular expression as a list, or an empty list if no match is found.

Note that match and search match once, while findall matches all.

findall(string[, pos[, endpos]])

string: the string to be matched.

pos: optional parameter, specify the starting position of the string, default is 0.

endpos: optional parameter, specify the end position of the string, default is the length of the string.

import re
#Matching numbers
r=(r'\d+') 
r1=('This is a demo')
r2=('This is 111 and That is 222',0,11)
r3=('This is 111 and That is 222',0,27)
 
print(r1)
print(r2)
print(r3)

Run results:

function (math.)

Similar to findall, finds all substrings matched by a regular expression in a string and returns them as an iterator.

(pattern, string, flags=0)

pattern: the regular expression to match.

string: the string to be matched.

flags: flags, used to control the regular expression matching mode, such as whether the case sensitive, multi-line matching and so on.

import re 

r=(r'\d+','This is 111 and That is 222')
for i in r: 
 print (())

Run results:

function (math.)

Returns a string as a list after splitting it by the substrings matched by the regular expression.

(pattern, string[, maxsplit=0, flags=0])

pattern: the regular expression to match.

string: the string to be matched.

maxsplit: split times, maxsplit=1 split once, default is 0, unlimited times.

flags: flags, used to control the regular expression matching mode, such as: whether case-sensitive, multi-line matching and so on.

import re 

r1=('\W+','This is 111 and That is 222') 
r2=('\W+','This is 111 and That is 222',maxsplit=1) 
r3=('\d+','This is 111 and That is 222') 
r4=('\d+','This is 111 and That is 222',maxsplit=1) 
print(r1)
print(r2)
print(r3)
print(r4)

Run results:

function (math.)

function is used to replace matches in a string.

(pattern, repl, string, count=0, flags=0)

pattern: the pattern string in the rule.

repl: the string to replace, can also be a function.

string: the original string to be replaced by the search.

count: the maximum number of times the pattern will be replaced after matching, default 0 means replace all matches.

import re 

r='This is 111 and That is 222'
# Delete the number in the string
r1=(r'\d+','',r)
print(r1)
# Delete non-numeric strings
r2=(r'\D','',r)
print(r2)

Run results:

References:

/python/#flags

to this article on the Python commonly used regular expression processing function explains the article is introduced to this, more related python regular expression processing function content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!