I. Functions
Function: the compile function is used to compile a regular expression and generate a regular expression (Pattern) object for use by the match() and search() functions.
Case: How to tell if it's a regular cell phone number
phone=''' weref234 16888888888 as13423423 weq 435435 15812312312e afa15812312316 13111111111 ''' pattern=(r'1[3-9]\d{9}') #todo compiles a regular expression and then gets a compiled object result=(phone) The #todo search will only return the first match, if no match is made it returns None. print(result) #todo < object; span=(10, 21), match='15812312312'>
Print results:
< object; span=(10, 21), match=‘16888888888’>
16888888888
(10, 21)
🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 Special Note 1
result=(phone): search match successful return is the Match object; search will only return the first match results, if there is no successful match then return None
(): Returns the first match.
(): return the subscript of the first match, why is it (10, 21) na?
weref234: starting from 0, after the end of 4 also contains the newline character '/n' 2 characters, so starting from 10, does not contain 21, the principle of the front closed and then open
🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 Special Note 2
match function: match match successful return is the Match object; role used to match the first note is the first character, where the first is in the string of characters being searched on the first index; if the first character does not match, return None
phone=''' weref234 16888888888 as13423423 weq 435435 15812312312e afa15812312316 13111111111 ''' pattern=(r'1[3-9]\d{9}') #todo compiles a regular expression and then gets a compiled object result2=(phone) print(result2)
Prints None, because the first character starts with a w, so it doesn't match the
phone1="17812312345aaa" pattern=(r'1[3-9]\d{9}') #todo compiles a regular expression and then gets a compiled object result2=(phone1) The #todo match function matches from the first character (starting with w) and returns None if the first character does not match. print(result2) print(()) #todo Returns the first match print(()) #todo Returns the subscript of the first matching result
Print results:
< object; span=(0, 11), match=‘17812312345’>
17812312345
(0, 11)
II. Regular expressions
Expression character
. : matches any single character (excluding newlines)
[]: match the character set, the set in the interval, can match any one of the characters
\d: matches a number, i.e. 0-9, which can be expressed as [0-9],
\s: matches blank characters, i.e. spaces
\S: matches non-blank characters.
\w: matches word characters, i.e. a-z, A-Z, 0-9, _
\W: Match non-word characters
import re #. Match any single character (excluding newlines) one='123a' res=('.',one) print(()) #[]: match the set of characters, the set in the interval, can match any one of the characters two='8' res1=('[0-9]',two) #Match numbers between 1-9 print(()) res2=('[0,1,8,9]',two) #Match numbers between 0, 1, 8, 9 print(()) two_2='Hello Python' print(('[hH]',two_2).group()) #Match lowercase or uppercase H #\d: match numbers, i.e. 0-9, can be expressed as [0-9] three='Sky 1 Launched Successfully' print(('Sky\d',three).group()) print(('\d',three).group())
express a number
*: matches 0 or unlimited occurrences of the previous character, i.e., can be present or absent
+: matches 1 or an infinite number of occurrences of the previous character, i.e. at least 1 time
? : matches 1 or 0 occurrences of the previous character, i.e., either 1 or none.
{m}: matches m occurrences of the previous character
{m, }: match the previous character at least m times
{m, n}: matches the previous character appearing from m to n times
Requirement 1: Match a string where the first letter is uppercase and the subsequent letters must be lowercase or none at all
print(('[A-Z][a-z]*','Mn').group()) print(('[A-Z][a-z]*','Msdfsg').group()) print(('[A-Z][a-z][a-z]','Msdfs').group())
Mn
Msdfsg
Msd
Requirement 2 Match a variable name that
print((r'[a-zA-Z_]+[\w]*','name1').group()) print((r'[a-zA-Z_]+[\w]*','_name1').group()) print((r'[a-zA-Z_]+[\w]*','2_name1'))
name1
_name1
None
Requirement 3: Match any number between 0-99
print(('[0-9]?[0-9]','88').group()) print(('[0-9]?[0-9]','8').group()) print(('[0-9]?[0-9]','08').group()) print(('[0-9]?[0-9]','888').group())
88
8
08
88
Requirement 4: Matching password (8-20 digits, can be case-sensitive letters, numbers, underscores)
print(('[a-zA-Z0-9_]{8,20}','12345678').group()) print(('[a-zA-Z0-9_]{8}','12345678').group())
12345678
12345678
Matching Boundaries
^: Indicates what it starts with
$: indicates what it ends with
\b: indicates the boundary of the matching word
| or
Requirement 5: Match 163 e-mail address, e-mail user name contains 6 to 18 characters, can be numbers, letters, underscores, but must start with a letter, .
emails=''' awhaldc@ asdasdfddasdfascvdfgbdfgdsds@ afa_@ awhaldc666@ q112dsdasdas@ aaaa_____@ aaaa____@ ''' print(('^[a-zA-Z][\w]{5,17}@163\.com$',emails,).group())
q112dsdasdas@
Requirement 6: Matching word boundaries
print((r'.*\bbeijing\b','I Love beijing too')) print((r'.*\bbeijing\b','I Love beijing1 too')) print((r'.*beijing','I Love beijing too'))
I Love beijing
None
I Love beijing
III. Advanced Usage of the re Module
(pattern,string)
1, findall: pattern in the string of all non-repeating matches, return an iterator iterator saves the matching object
Requirement 7: Match all mailboxes that meet the following criteria
163 e-mail address.
The username of the mailbox contains 6 to 18 characters.
It can be numbers, letters, and underscores,
but it must begin with a letter.
.com ending
import re emails=''' awhaldc@ asdasdfddasdfascvdfgbdfgdsds@ afa_@ 112dsdasdas@ aaaa_____@ aaaa____@ ''' #findall: pattern all non-repeat matches in string, returns an iterator iterator that holds the matches. list=(r'(^[a-zA-Z][\w]{5,17}@(163|126).com$)',emails,) print(list) for email in list: print(email[0])
[(‘awhaldc@’, ‘163’), (‘aaaa_____@’, ‘126’), (‘aaaa____@’, ‘163’)]
awhaldc@
aaaa_____@
aaaa____@
2、sub:will match to the string, the operation again
Requirement 8: Match a number, home 1 the matched number, return
def add(result): #result is a match str_num=() num=int(str_num)+1 return str(num) print((r'\d+',add,'a=111'))
a=112
3、split:Cut the successful match string
line='hello,world,china.' print((r'\W+',line))
[‘hello’, ‘world’, ‘china’, ‘’]
Requirement 9: Cut strings with colons or spaces.
print((r':| ','info:kobe 18 beijing'))
[‘info’, ‘kobe’, ‘18’, ‘beijing’]
IV. Greedy and non-greedy models
What is the greed model?
Quantifiers in python are greedy by default and always try to match as many characters as possible
What is the non-greedy model?
In contrast to greedy mode, which always tries to match as few characters as possible, you can use, ? , +, {m,n} plus ? , to change greedy mode to non-greedy mode*.
Requirement 9: Non-Greedy Pattern, Requirement: separate the phone call from the descriptive information of the phone call as much as possible, only use regular expressions
line2='this is my phone:188-1111-6666' #Non-greedy pattern, requirement: separate phone calls and phone description information as much as possible, can only use regular expressions result=(r'(.+?)(\d+-\d+-\d+)',line2) print((1)) print((2))
this is my phone:
188-1111-6666
summarize
to this article on python regular expression usage super detailed explanation of the article is introduced to this, more related python regular expression usage content please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!