SoFunction
Updated on 2025-03-02

Detailed explanation of seven usage examples of Python regular expressions

As a concept, regular expressions are not unique to Python. However, there are still some slight differences in the actual use of regular expressions in Python.

This article is part of a series of articles on Python regular expressions. In the first post in this series, we will focus on how to use regular expressions in Python and highlight some unique features in Python.

We will introduce some methods for searching and searching strings in Python. Then we talk about how to use grouping to handle the child items of the matching objects we found.

The modules for regular expressions in Python that we are interested in using are usually called ‘re’.

>>> import re 

1. Primitive type strings in Python

The Python compiler uses '\' (backslash) to represent escaped characters in string constants.

If the backslash is followed by a string of special characters that the compiler can recognize, the entire escape sequence will be replaced with the corresponding special characters (for example, '\n' will be replaced by the compiler with a newline).

But this creates a problem with using regular expressions in Python, because backslashes are also used in the ‘re’ module to escape special characters (such as * and +) in regular expressions.

The mix of these two methods means that sometimes you have to escape the escape character itself (when special characters can be recognized by both Python and regular expression compilers), but at other times you don't have to do so (if special characters can only be recognized by Python compilers).

Rather than focusing on understanding how many backslashes we need, we can use the original string instead.

Primitive strings can be created simply by adding a character ‘r’ to the double quotes of a normal string. When a string is a primitive type, the Python compiler will not attempt to make any replacements to it. Essentially, you are telling the compiler not to interfere with your string at all.

>>> string = 'This is a\nnormal string'
>>> rawString = r'and this is a\nraw string'
>>> print string 
This is a normal string 
>>> print rawString 
and this is a\nraw string 
This is a primitive string。

Use regular expressions for searching in Python

The ‘re’ module provides several methods to query the input strings exactly. The methods we will discuss are:

•()
•()
•()
Each method receives a regular expression and a string to be found for matching. Let's look at each of these methods in more detail to figure out how they work and how they differ.

2. Use Find – Match Start

Let's first look at the match() method. The match() method works by only when the pattern is matched at the beginning of the searched string can be found.

For example, calling the mathch() method to the string 'dog cat dog', and looking for the pattern 'dog' will match:

>>> (r'dog', 'dog cat dog') 
<_sre.SRE_Match object at 0xb743e720< 
>>> match = (r'dog', 'dog cat dog') 
>>> (0) 
'dog'

We will discuss more about the group() method later. Now we just need to know that we called it with 0 as its parameter, and the group() method returns the matching pattern found.

I've also skipped the returned SRE_Match object for now, and we'll discuss it soon.

However, if we call the math() method on the same string and look for the pattern 'cat', no match is found.

>>> (r'cat', 'dog cat dog') 
>>> 

3. Use Find – Match Any Location

The search() method is similar to match(), but the search() method does not restrict us from finding matches only from the beginning of the string, so looking for 'cat' in our example string will find a match:

search(r'cat', 'dog cat dog') 
>>> (0) 
'cat'

However, the search() method will stop searching after it finds a match, so in our example string, searching for 'dog' only finds its first appearance location using the search() method.

>>> match = (r'dog', 'dog cat dog') 
>>> (0) 
'dog'

4. Use – All matching objects

The most search method I've used in Python so far is the findall() method. When we call the findall() method, we can get a list of all matching patterns very simply, instead of getting the match object (we will discuss the match object more in the next step). It's even simpler for me. Calling the findall() method on the sample string we get:

['dog', 'dog'] 
>>> (r'cat', 'dog cat dog') 
['cat']

5. Use and Methods

So, what exactly is the 'match' object" that was previously returned to us by search() and match() methods?

Unlike the simple return of the matching part of the string, the "match object" returned by search() and match() is actually a wrapper class about matching substrings.

Previously you saw that I could get the matching substring by calling the group() method (we will see in the next section that the match object is actually very useful when dealing with grouping problems), but the match object also contains more information about matching substrings.

For example, the match object can tell us where the matched content starts and ends in the original string:

>>> match = (r'dog', 'dog cat dog') 
>>> () 
0
>>> () 
3

Knowing this information is sometimes very useful.

6. Use by digital grouping

As I mentioned before, matching objects are very handy when handling groupings.

Grouping is the ability to locate specific substrings of the entire regular expression. We can define a group as part of the entire regular expression, and then locate the corresponding matching content of this part separately.

Let's take a look at how it works:

>>> contactInfo = 'Doe, John: 555-1212' 

The string I just created is similar to a fragment taken from someone's address book. We can match this line with a regular expression like this:

>>> (r'\w+, \w+: \S+', contactInfo) 
<_sre.SRE_Match object at 0xb74e1ad8<

By surrounding a specific part of a regular expression with parentheses (characters ‘(’ and ‘)’), we can group the content and then process these subgroups separately.

>>> match = (r'(\w+), (\w+): (\S+)', contactInfo) 

These groups can be obtained by using the group() method of grouping objects. They can be positioned (starting from 1):

>>> (1) 
'Doe'
>>> (2) 
'John'
>>> (3) 
'555-1212'

The reason why the ordinal number of a group starts from 1 is that the 0th group is reserved to store all matching objects (we have seen it before learning the match() method and the search() method).

>>> (0) 
'Doe, John: 555-1212'

7. Use grouping by alias

Sometimes, especially when a regular expression has many groups, positioning through the order of occurrence of groups will become unrealistic. Python also allows you to specify a group name through the following statement:

 >>> match = (r'(?P<last>\w+), (?P<first>\w+): (?P<phone>\S+)', contactInfo)

We can still use the group() method to get the content of the group, but at this time we need to use the group name we specified instead of the number of digits of the group we used before.

>>> ('last') 
'Doe'
>>> ('first') 
'John'
>>> ('phone') 
'555-1212'

This greatly enhances the clarity and readability of the code. You can imagine that when regular expressions become more and more complex, it will become increasingly difficult to understand what a grouping is capturing. Naming your group will clearly tell you and your readers your intentions.

Although the findall() method does not return a grouped object, it can also use grouping. Similarly, the findall() method will return a collection of tuples where the Nth element in each tuple corresponds to the Nth group in the regular expression.

>>> (r'(\w+), (\w+): (\S+)', contactInfo) 
[('Doe', 'John', '555-1212')]

However, naming a grouping does not work with the findall() method.

In this article we introduce some basics of using regular expressions in Python. We learned the original string type (and it can help you solve some headaches when using regular expressions). We also learned how to use match(), search(), and findall() methods for basic queries, and how to use grouping to handle subcomponents of matching objects.

As always, if you want to see more about this topic, the official Python documentation for the re module is a very good resource.

In future articles, we will discuss the application of regular expressions in Python in more depth. We will learn more comprehensively about matching objects, learn how to use them to replace them in strings, and even use them to parse Python data structures from text files.

This article is translated by Bole Online - The Soul of the Left Hand fromthegeekstuff