SoFunction
Updated on 2024-10-29

Python regular expression usage super detailed explanation of the book

I. Functions

Function: the compile function is used to compile a regular expression and generate a regular expression (Pattern) object for use by the match() and search() functions.

Case: How to tell if it's a regular cell phone number

phone='''
weref234
16888888888
as13423423
weq
435435
15812312312e
afa15812312316
13111111111
'''

pattern=(r'1[3-9]\d{9}')  	#todo compiles a regular expression and then gets a compiled object

result=(phone)   			The #todo search will only return the first match, if no match is made it returns None.
print(result)                   		#todo < object; span=(10, 21), match='15812312312'>

Print results:

< object; span=(10, 21), match=‘16888888888’>
16888888888
(10, 21)

🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 Special Note 1

result=(phone): search match successful return is the Match object; search will only return the first match results, if there is no successful match then return None
(): Returns the first match.
(): return the subscript of the first match, why is it (10, 21) na?
weref234: starting from 0, after the end of 4 also contains the newline character '/n' 2 characters, so starting from 10, does not contain 21, the principle of the front closed and then open

🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 Special Note 2

match function: match match successful return is the Match object; role used to match the first note is the first character, where the first is in the string of characters being searched on the first index; if the first character does not match, return None

phone='''
weref234
16888888888
as13423423
weq
435435
15812312312e
afa15812312316
13111111111
'''

pattern=(r'1[3-9]\d{9}')     #todo compiles a regular expression and then gets a compiled object
result2=(phone)
print(result2)

Prints None, because the first character starts with a w, so it doesn't match the

phone1="17812312345aaa"
pattern=(r'1[3-9]\d{9}')     #todo compiles a regular expression and then gets a compiled object

result2=(phone1)   The #todo match function matches from the first character (starting with w) and returns None if the first character does not match.
print(result2)
print(())   #todo Returns the first match
print(())    #todo Returns the subscript of the first matching result

Print results:

< object; span=(0, 11), match=‘17812312345’>
17812312345
(0, 11)

II. Regular expressions

Expression character

. : matches any single character (excluding newlines)
[]: match the character set, the set in the interval, can match any one of the characters
\d: matches a number, i.e. 0-9, which can be expressed as [0-9],
\s: matches blank characters, i.e. spaces
\S: matches non-blank characters.
\w: matches word characters, i.e. a-z, A-Z, 0-9, _
\W: Match non-word characters

import re
#. Match any single character (excluding newlines)
one='123a'
res=('.',one)
print(())

#[]: match the set of characters, the set in the interval, can match any one of the characters
two='8'
res1=('[0-9]',two)   #Match numbers between 1-9
print(())

res2=('[0,1,8,9]',two)  #Match numbers between 0, 1, 8, 9
print(())

two_2='Hello Python'
print(('[hH]',two_2).group())    #Match lowercase or uppercase H

#\d: match numbers, i.e. 0-9, can be expressed as [0-9]
three='Sky 1 Launched Successfully'
print(('Sky\d',three).group())
print(('\d',three).group())

express a number

*: matches 0 or unlimited occurrences of the previous character, i.e., can be present or absent
+: matches 1 or an infinite number of occurrences of the previous character, i.e. at least 1 time
? : matches 1 or 0 occurrences of the previous character, i.e., either 1 or none.
{m}: matches m occurrences of the previous character
{m, }: match the previous character at least m times
{m, n}: matches the previous character appearing from m to n times

Requirement 1: Match a string where the first letter is uppercase and the subsequent letters must be lowercase or none at all

print(('[A-Z][a-z]*','Mn').group())
print(('[A-Z][a-z]*','Msdfsg').group())
print(('[A-Z][a-z][a-z]','Msdfs').group())

Mn
Msdfsg
Msd

Requirement 2 Match a variable name that

print((r'[a-zA-Z_]+[\w]*','name1').group())
print((r'[a-zA-Z_]+[\w]*','_name1').group())
print((r'[a-zA-Z_]+[\w]*','2_name1'))

name1
_name1
None

Requirement 3: Match any number between 0-99

print(('[0-9]?[0-9]','88').group())
print(('[0-9]?[0-9]','8').group())
print(('[0-9]?[0-9]','08').group())
print(('[0-9]?[0-9]','888').group())

88
8
08
88

Requirement 4: Matching password (8-20 digits, can be case-sensitive letters, numbers, underscores)

print(('[a-zA-Z0-9_]{8,20}','12345678').group())
print(('[a-zA-Z0-9_]{8}','12345678').group())

12345678
12345678

Matching Boundaries

^: Indicates what it starts with
$: indicates what it ends with
\b: indicates the boundary of the matching word
| or

Requirement 5: Match 163 e-mail address, e-mail user name contains 6 to 18 characters, can be numbers, letters, underscores, but must start with a letter, .

emails='''
    awhaldc@
asdasdfddasdfascvdfgbdfgdsds@
afa_@
awhaldc666@
q112dsdasdas@
aaaa_____@
aaaa____@
'''
print(('^[a-zA-Z][\w]{5,17}@163\.com$',emails,).group())

q112dsdasdas@

Requirement 6: Matching word boundaries

print((r'.*\bbeijing\b','I Love beijing too'))
print((r'.*\bbeijing\b','I Love beijing1 too'))

print((r'.*beijing','I Love beijing too'))

I Love beijing
None
I Love beijing

III. Advanced Usage of the re Module

(pattern,string)

1, findall: pattern in the string of all non-repeating matches, return an iterator iterator saves the matching object

Requirement 7: Match all mailboxes that meet the following criteria

163 e-mail address.

The username of the mailbox contains 6 to 18 characters.

It can be numbers, letters, and underscores,

but it must begin with a letter.

.com ending

import re

emails='''
awhaldc@
asdasdfddasdfascvdfgbdfgdsds@
afa_@
112dsdasdas@
aaaa_____@
aaaa____@
'''

#findall: pattern all non-repeat matches in string, returns an iterator iterator that holds the matches.
list=(r'(^[a-zA-Z][\w]{5,17}@(163|126).com$)',emails,)

print(list)
for email in list:
    print(email[0])

[(‘awhaldc@’, ‘163’), (‘aaaa_____@’, ‘126’), (‘aaaa____@’, ‘163’)]
awhaldc@
aaaa_____@
aaaa____@

2、sub:will match to the string, the operation again

Requirement 8: Match a number, home 1 the matched number, return

def add(result):    #result is a match
    str_num=()
    num=int(str_num)+1
    return str(num)


print((r'\d+',add,'a=111'))

a=112

3、split:Cut the successful match string

line='hello,world,china.'
print((r'\W+',line))

[‘hello’, ‘world’, ‘china’, ‘’]

Requirement 9: Cut strings with colons or spaces.

print((r':| ','info:kobe 18 beijing'))

[‘info’, ‘kobe’, ‘18’, ‘beijing’]

IV. Greedy and non-greedy models

What is the greed model?
Quantifiers in python are greedy by default and always try to match as many characters as possible
What is the non-greedy model?
In contrast to greedy mode, which always tries to match as few characters as possible, you can use, ? , +, {m,n} plus ? , to change greedy mode to non-greedy mode*.

Requirement 9: Non-Greedy Pattern, Requirement: separate the phone call from the descriptive information of the phone call as much as possible, only use regular expressions

line2='this is my phone:188-1111-6666'
#Non-greedy pattern, requirement: separate phone calls and phone description information as much as possible, can only use regular expressions
result=(r'(.+?)(\d+-\d+-\d+)',line2)
print((1))
print((2))

this is my phone:
188-1111-6666

summarize

to this article on python regular expression usage super detailed explanation of the article is introduced to this, more related python regular expression usage content please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!