SoFunction
Updated on 2024-10-28

Python Regular Expression Usage Examples Sharing

As a concept, regular expressions are not unique to Python. However, there are some minor differences in the actual use of regular expressions in Python.

This article is part of a series of articles on regular expressions in Python. In the first article of this series, we will focus on how to use regular expressions in Python and highlight some of the unique features in Python.

We'll cover some of the ways to search and find on strings in Python. Then we'll discuss how to use grouping to handle the children of the matches we find.

The module for regular expressions in Python that we are interested in using is usually called 're'.

>>> import re

1. Primitive type strings in Python

The Python compiler uses '\' (backslash) to indicate escape characters in string constants.

If the backslash is followed by a string of special characters that the compiler recognizes, the entire escape sequence will be replaced with the corresponding special characters (e.g., '\n' will be replaced by the compiler with a line break).

However, this poses a problem for using regular expressions in Python, as backslashes are also used in the 're' module to escape special characters in regular expressions (such as * and +).

The mix of these two approaches means that sometimes you have to escape the escape character itself (when the special character can be recognized by both the Python and regular expression compilers), but at other times you don't have to (if the special character can only be recognized by the Python compiler).

Instead of putting our minds to figuring out exactly how many backslashes are needed, we can use the original string instead.

Primitive type strings can be created simply by prefixing the double quotes of a normal string with the character 'r'. When a string is of primitive type, the Python compiler

It won't try to make any substitutions to it. Essentially, you're telling the compiler not to interfere with your string at all.

>>> string = 'This is a\nnormal string'
>>> rawString = r'and this is a\nraw string'
>>> print string

This is a normal string

>>> print rawString
and this is a\nraw string

This is a raw type string.

Finding with Regular Expressions in Python

The 're' module provides several methods to perform exact queries on the input string. The methods we will be discussing are:

•()
•()
•()

Each method takes a regular expression and a string to find a match for. Let's look at each of these methods in more detail to figure out how they work and what makes them different.

2. Use Find - Match to get started

Let's take a look at the match() method. match() works in such a way that it can only find a match if the beginning of the string being searched matches the pattern.
As an example, call the mathch() method on the string 'dog cat dog' and look for the pattern 'dog' will match:

>>> (r'dog', 'dog cat dog')
<_sre.SRE_Match object at 0xb743e720<
>>> match = (r'dog', 'dog cat dog')
>>> (0)
'dog'

We'll discuss the group() method more later. For now, all we need to know is that we called it with 0 as its argument, and that the group() method returns the matched pattern that was found.
I've also skipped the returned SRE_Match object for now, which we'll get to shortly as well.
However, if we call the math() method on the same string, looking for the pattern 'cat', no match will be found.

>>> (r'cat', 'dog cat dog')
>>>

3. Use Find - Match Any Position

The search() method is similar to match(), but the search() method doesn't restrict us to looking for matches only from the beginning of the string, so looking for 'cat' in our example string will find a match:

search(r'cat', 'dog cat dog')
>>> (0)
'cat'

However, the search() method stops looking after it finds a match, so in our example string the search() method finds only the first occurrence of 'dog'.

>>> match = (r'dog', 'dog cat dog')
>>> (0)
'dog'

4. Use - all matches
By far the find method I use the most in Python is the findall() method. When we call the findall() method, we can very simply get a list of all matching patterns instead of getting the match object (we'll talk more about the match object next). For me it's even simpler. Calling the findall() method on the example string we get:

['dog', 'dog']
>>> (r'cat', 'dog cat dog')
['cat']

5. Use and methods

So what exactly is the 'match' object that the previous search() and match() methods previously returned to us"?
Unlike simply returning the matching part of a string, the "match object" returned by search() and match() is actually a wrapper class for the matching substring.
Earlier you saw that I could get matching substrings by calling the group() method, (and as we'll see in the next section, the match object is actually very useful when dealing with grouping), but the match object also contains more information about matching substrings.
For example, the match object can tell us where the match begins and ends in the original string:

>>> match = (r'dog', 'dog cat dog')
>>> ()
>>> ()

Knowing this information is sometimes very useful.

6. Use of grouping by number

As I mentioned before, matching objects are very handy when dealing with grouping.
A grouping is the ability to localize a specific substring of an entire regular expression. We can define a subgroup as part of an entire regular expression, and then individually localize that part of the match.
Let's see how it works:

>>> contactInfo = 'Doe, John: 555-1212'

The string I just created resembles a snippet from someone's address book. We can match this line with a regular expression like this:

>>> (r'\w+, \w+: \S+', contactInfo)
<_sre.SRE_Match object at 0xb74e1ad8<

By enclosing specific parts of the regular expression in parentheses (the characters '(' and ')'), we can group the contents and then treat these subgroups separately.

>>> match = (r'(\w+), (\w+): (\S+)', contactInfo)

These groups can be obtained by using the group() method of the group object. They can be located by the numerical order in which they appear from left to right in the regular expression (starting with 1):

>>> (1)
'Doe'
>>> (2)
'John'
>>> (3)
'555-1212'

The reason the groups start at 1 is because the 0th group is reserved to hold all matches (as we saw when we learned about the match() and search() methods earlier).

>>> (0)
'Doe, John: 555-1212'

7. Use of grouping by aliases

Sometimes, especially when a regular expression has many groups, it becomes impractical to locate groups by the order in which they appear.Python also allows you to specify a group name with the following statement:

 >>> match = (r'(?P<last>\w+), (?P<first>\w+): (?P<phone>\S+)', contactInfo)

We can still use the group() method to get the contents of the group, but this time we're going to use the name of the group we specified instead of the number of bits where the group was located that we used before.

>>> ('last')
'Doe'
>>> ('first')
'John'
>>> ('phone')
'555-1212'

This greatly enhances the clarity and readability of the code. As you can imagine, as regular expressions become more complex, it becomes harder and harder to figure out what a grouping captures. Naming your groupings will clearly tell you and your readers what you intend to do.
Even though the findall() method does not return a grouping object, it can use groupings. Similarly, the findall() method will return a collection of tuples, where the Nth element in each tuple corresponds to the Nth grouping in the regular expression.

>>> (r'(\w+), (\w+): (\S+)', contactInfo)
[('Doe', 'John', '555-1212')]

However, naming groups does not apply to the findall() method.

In this article we covered some of the basics of using regular expressions in Python. We learned about primitive string types (and some of the headaches it can help you with when using regular expressions). We also learned how to adapt basic queries using the match(), search(), and findall() methods, and how to use grouping to work with subcomponents of matched objects.

As always, the official Python documentation for re modules is a great resource if you want to see more on this topic.

In future posts, we will discuss the use of regular expressions in Python in more depth. We'll take a more comprehensive look at matching objects, learn how to use them to do substitutions in strings, and even use them to parse Python data structures from text files.