Python Regular Expressions Guide

introduction

Regular expression (regex for short) is a powerful tool for handling strings and text. It uses specific syntax to define a set of rules through which text can be matched, searched, replaced, etc. Python provides the re module to make the functionality of regular expressions easy to use. This article will introduce in detail how to use regular expressions in Python, and help beginners understand the basic concepts and applications of regular expressions through code examples.

1. Basics of Regular Expressions

The core of regular expressions is to define text patterns using a special syntax that can be used to match or find strings. Through regular expressions, complex string search and processing tasks can be completed quickly. Understanding the most basic rules of regular expressions is the key to using it.

1.1 Commonly used regular expression symbols

Here are some common regular expression notations:

.: Match any single character (except line breaks).
^: Match the beginning of the string.
$: Match the end of the string.
*: Match the previous character zero or multiple times.
+: Match the previous character once or more times.
?: Match the previous character zero or once (non-greedy mode).
{n}: Match the previous charactersnSecond-rate.
{n, m}: Match the previous charactersnarrivemSecond-rate.
[abc]: Matcha、borcAny character in it.
[^abc]: Match excepta、b、cAny character other than .
|: means "or" operation.
\d: Match any number, equivalent to[0-9]。
\D: Match any non-numeric character.
\w: Match letters, numbers or underscores, equivalent to[A-Za-z0-9_]。
\W: Match characters that are not letters, numbers, or underscores.
\s: Match whitespace characters, such as spaces, tab characters, etc.
\S: Match non-whitespace characters.

1.2 Basic syntax of regular expressions

To use regular expressions, you first need to understand its syntax. For example, expression\d{3}-\d{4}It can be used to match a format with a 3-digit number plus a hyphen and a 4-digit number (such as the phone number "123-4567"). In Python, regular expressions must use the original string (i.e. prefix the string withr), otherwise it will cause escape character errors.

pattern = r"\d{3}-\d{4}"

2. Introduction to Python Regular Expressions Re Module

Python'sreThe module provides a variety of regular expression functions, mainly including matching, search, replacement and other operations.reThe core functions of the module include:

(): Match regular expressions from the beginning of the string.
(): Find the first matching substring in the entire string.
(): Find all matching substrings and return a list.
(): Find all matching substrings and return an iterator.
(): Replace all matching substrings.
(): Precompile regular expressions to improve performance.

The following will explain in detail how to use these functions.

3.(): Match from the beginning of the string

()Used to check whether a string starts with a certain pattern. If the match is successful, it will return aMatchObject, otherwise returnNone。

Example

import re

text = "Hello World"
pattern = r"Hello"

# Match from the beginning of the stringmatch = (pattern, text)
if match:
    print("Matching successfully:", ())
else:
    print("Match failed")

Output：

Match successfully: Hello

In the example above,()Start matching from the beginning of the stringHello, return after successful matchMatchObject.

4.(): Find a match in a string

()Used to find the first matching substring throughout the string, not just the beginning part.

Example

import re

text = "Say Hello World"
pattern = r"Hello"

# Find the entire stringsearch = (pattern, text)
if search:
    print("Finding match:", ())
else:
    print("No match found")

Output：

Find a match: Hello

5. (): Find all matches

()Returns a list of all matching substrings, suitable for finding multiple matches.

Example

import re

text = "123-4567, 234-5678, 345-6789"
pattern = r"\d{3}-\d{4}"

# Find all matchesmatches = (pattern, text)
print("Match found:", matches)

Here, () finds all content in the string that conforms to the format \d{3}-\d{4}.

6. (): Return the matching iterator

() is similar to (), but returns an iterator, each element is a Match object, suitable for cases where each matching result needs to be processed one by one.

Example

import re

text = "abc123def456ghi789"
pattern = r"\d+"

# Find all matches and iteratematches = (pattern, text)
for match in matches:
    print("Match found:", ())

Output：

Matches found: 123
Match found: 456
Matches found: 789

7.(): Replace the match

()The matching part can be replaced with the specified content, which is very suitable for cleaning and formatting strings.

Example

import re

text = "Call me at 123-4567 or 987-6543."
pattern = r"\d{3}-\d{4}"

# Replace phone number as [REDACTED]new_text = (pattern, "[REDACTED]", text)
print("Replacement result:", new_text)

Output：

Replacement result: Call me at [REDACTED] or [REDACTED].

In this example, () replaces all phone numbers with [REDACTED].

8. (): Precompiled regular expressions

For regular expressions that require multiple use, using () can improve efficiency. () precompiles the regular expression and returns a Pattern object, which can be used to perform various regular operations.

Example

import re

text = "Email: abc@ and xyz@"
pattern = (r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")

# Use precompiled objects for matchingmatches = (text)
print("Email address found:", matches)

Output：

Email address found: ['abc@', 'xyz@']

Here we use()Compiled a regular expression matching mailbox, and then it can be passedPatternThe object uses this regular expression multiple times.

9. Common application examples of regular expressions

9.1 Verify email address

import re

email = "test@"
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
if (pattern, email):
    print("This is a valid email address")
else:
    print("Invalid Email Address")

9.2 Extract phone number

import re

text = "Please call 123-4567 or 987-6543 for more information."
pattern = r"\d{3}-\d{4}"
matches = (pattern, text)
print("Extracted Phone Number:", matches)

9.3 Replace sensitive words

import re

text = "This is a bad

 example of a bad word."
pattern = r"bad"
clean_text = (pattern, "[censored]", text)
print("After replacing sensitive words:", clean_text)

10. Summary

Regular expressions are powerful tools for processing text, which can complete complex string matching and processing tasks concisely and efficiently. Using the match, search, findall, finder, sub, and other methods of the re module in Python can easily manipulate strings. Master the basic syntax and common methods of regular expressions, novices can also flexibly use regular expressions to deal with various string matching problems in actual applications.

The above is the detailed content of the Python regular expression usage guide. For more information about Python regular expressions, please follow my other related articles!