introduction
Regular expression (regex for short) is a powerful tool for handling strings and text. It uses specific syntax to define a set of rules through which text can be matched, searched, replaced, etc. Python provides the re module to make the functionality of regular expressions easy to use. This article will introduce in detail how to use regular expressions in Python, and help beginners understand the basic concepts and applications of regular expressions through code examples.
1. Basics of Regular Expressions
The core of regular expressions is to define text patterns using a special syntax that can be used to match or find strings. Through regular expressions, complex string search and processing tasks can be completed quickly. Understanding the most basic rules of regular expressions is the key to using it.
1.1 Commonly used regular expression symbols
Here are some common regular expression notations:
-
.
: Match any single character (except line breaks). -
^
: Match the beginning of the string. -
$
: Match the end of the string. -
*
: Match the previous character zero or multiple times. -
+
: Match the previous character once or more times. -
?
: Match the previous character zero or once (non-greedy mode). -
{n}
: Match the previous charactersn
Second-rate. -
{n, m}
: Match the previous charactersn
arrivem
Second-rate. -
[abc]
: Matcha
、b
orc
Any character in it. -
[^abc]
: Match excepta
、b
、c
Any character other than . -
|
: means "or" operation. -
\d
: Match any number, equivalent to[0-9]
。 -
\D
: Match any non-numeric character. -
\w
: Match letters, numbers or underscores, equivalent to[A-Za-z0-9_]
。 -
\W
: Match characters that are not letters, numbers, or underscores. -
\s
: Match whitespace characters, such as spaces, tab characters, etc. -
\S
: Match non-whitespace characters.
1.2 Basic syntax of regular expressions
To use regular expressions, you first need to understand its syntax. For example, expression\d{3}-\d{4}
It can be used to match a format with a 3-digit number plus a hyphen and a 4-digit number (such as the phone number "123-4567"). In Python, regular expressions must use the original string (i.e. prefix the string withr
), otherwise it will cause escape character errors.
pattern = r"\d{3}-\d{4}"
2. Introduction to Python Regular Expressions Re Module
Python'sre
The module provides a variety of regular expression functions, mainly including matching, search, replacement and other operations.re
The core functions of the module include:
-
()
: Match regular expressions from the beginning of the string. -
()
: Find the first matching substring in the entire string. -
()
: Find all matching substrings and return a list. -
()
: Find all matching substrings and return an iterator. -
()
: Replace all matching substrings. -
()
: Precompile regular expressions to improve performance.
The following will explain in detail how to use these functions.
3.(): Match from the beginning of the string
()
Used to check whether a string starts with a certain pattern. If the match is successful, it will return aMatch
Object, otherwise returnNone
。
Example
import re text = "Hello World" pattern = r"Hello" # Match from the beginning of the stringmatch = (pattern, text) if match: print("Matching successfully:", ()) else: print("Match failed")
Output:
Match successfully: Hello
In the example above,()
Start matching from the beginning of the stringHello
, return after successful matchMatch
Object.
4.(): Find a match in a string
()
Used to find the first matching substring throughout the string, not just the beginning part.
Example
import re text = "Say Hello World" pattern = r"Hello" # Find the entire stringsearch = (pattern, text) if search: print("Finding match:", ()) else: print("No match found")
Output:
Find a match: Hello
5. (): Find all matches
()
Returns a list of all matching substrings, suitable for finding multiple matches.
Example
import re text = "123-4567, 234-5678, 345-6789" pattern = r"\d{3}-\d{4}" # Find all matchesmatches = (pattern, text) print("Match found:", matches)
Here, () finds all content in the string that conforms to the format \d{3}-\d{4}.
6. (): Return the matching iterator
() is similar to (), but returns an iterator, each element is a Match object, suitable for cases where each matching result needs to be processed one by one.
Example
import re text = "abc123def456ghi789" pattern = r"\d+" # Find all matches and iteratematches = (pattern, text) for match in matches: print("Match found:", ())
Output:
Matches found: 123
Match found: 456
Matches found: 789
7.(): Replace the match
()
The matching part can be replaced with the specified content, which is very suitable for cleaning and formatting strings.
Example
import re text = "Call me at 123-4567 or 987-6543." pattern = r"\d{3}-\d{4}" # Replace phone number as [REDACTED]new_text = (pattern, "[REDACTED]", text) print("Replacement result:", new_text)
Output:
Replacement result: Call me at [REDACTED] or [REDACTED].
In this example, () replaces all phone numbers with [REDACTED].
8. (): Precompiled regular expressions
For regular expressions that require multiple use, using () can improve efficiency. () precompiles the regular expression and returns a Pattern object, which can be used to perform various regular operations.
Example
import re text = "Email: abc@ and xyz@" pattern = (r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}") # Use precompiled objects for matchingmatches = (text) print("Email address found:", matches)
Output:
Email address found: ['abc@', 'xyz@']
Here we use()
Compiled a regular expression matching mailbox, and then it can be passedPattern
The object uses this regular expression multiple times.
9. Common application examples of regular expressions
9.1 Verify email address
import re email = "test@" pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" if (pattern, email): print("This is a valid email address") else: print("Invalid Email Address")
9.2 Extract phone number
import re text = "Please call 123-4567 or 987-6543 for more information." pattern = r"\d{3}-\d{4}" matches = (pattern, text) print("Extracted Phone Number:", matches)
9.3 Replace sensitive words
import re text = "This is a bad example of a bad word." pattern = r"bad" clean_text = (pattern, "[censored]", text) print("After replacing sensitive words:", clean_text)
10. Summary
Regular expressions are powerful tools for processing text, which can complete complex string matching and processing tasks concisely and efficiently. Using the match, search, findall, finder, sub, and other methods of the re module in Python can easily manipulate strings. Master the basic syntax and common methods of regular expressions, novices can also flexibly use regular expressions to deal with various string matching problems in actual applications.
The above is the detailed content of the Python regular expression usage guide. For more information about Python regular expressions, please follow my other related articles!