Introduction: Regular expressions are the "nuclear weapon" for dealing with strings, they are not only fast, but also powerful. In this article, we will not expand the syntax of regular expressions, but only briefly introduce the commonly used functions of regular expressions in python and how to use them, so that you can quickly look up and browse them.
01 Re Overview
The Re module is a built-in module for python that provides all the uses of regular expressions in python, and is installed by default in the Lib folder in the python root directory (e.g. . \Python\Python37\Lib). It mainly provides 3 main categories of string manipulation methods:
- Character Find/Match
- character replacement
- character segmentation
Since this is a string-oriented module, it is important to mention the string encoding type. re module, the pattern string and search string can be either Unicode strings (commonly str type) or 8-bit byte strings (bytes, 2 hexadecimal digits, e.g., \xe5), but they are required to be strings of the same type.
02 String Find/Match
Pre-compile: compile
Before introducing the find and match functions, you first need to know about re's compile function, which compiles a pattern string into a regular expression type for quick subsequent matching and reuse
import re pattern = (r'[a-z]{2,5}') type(pattern) #
This example creates a regular expression object (), named pattern, that matches a pattern string of 2-5 lowercase letters. Subsequent method calls can be made using pattern when using other regular expression functions.
Match: match
match function is used to start matching from the beginning of the text string position, if the match is successful, then return the corresponding matching object, at this time you can call group () method to return to the matching results, can also be used to return to match the start and end of the subscript interval span () method; otherwise return None
import re pattern = (r'[a-z]{2,5}') text1 = 'this is a re test' res = (text1) print(res) #< object; span=(0, 4), match='this'> if res: print(()) #this print(()) #(0, 4) text2 = 'yea, this is a re test' print((text2))#None
The match function also has a morphing function fullmatch, which returns a match if and only if the pattern string matches the text string exactly all the way through, otherwise it returns None.
Search: search
match only provides the result of matching from the beginning of the text string, if you want to match from any position, you can call the search method, similar to the match method, when any position is successfully matched, then immediately return a matching object, you can also call the span () method to get the starting and ending intervals, call the group method to get the matching text string
import re pattern = (r'\s[a-z]{2}') text1 = 'this is a re test' res = (text1) print(res) #< object; span=(4, 7), match=' is'> if res: print(()) #is print(()) #(4, 7) pattern2 = (r'\s[a-z]{5}') text2 = 'Yes, this is a re test'. print((text2))#None
match and search are used to match a single result, the only difference is that the former is from the beginning of the start of the match, while the latter from any position to match, the success of the match will return a match object.
Full search: findall/finditer
Almost the most commonly used regular expression function for finding all matches, e.g. in crawler information extraction, it can be very convenient to extract all matching fields
import re pattern = (r'\s[a-z]{2,5}') text1 = 'this is a re test' res = (text1) print(res) #[' is', ' re', ' test']
findall returns a list object type, when there is no match, an empty list is returned. In order to avoid using too much memory by returning a large number of matches at the same time, you can call the finditer function to return an iterator type, in which each iteration element is a match object, and you can continue to call the group and span methods to get the corresponding results.
import re pattern = (r'\s[a-z]{2,5}') text1 = 'this is a re test' res = (text1) for r in res: print(()) """ is re test """
When matching pattern strings is simple or only requires a word call, all of the above methods can also call the re class function directly without prior compilation. In this case, the first parameter of each method is the pattern string.
import re pattern = (r'\d{2,5}') text = 'this is re test' ('[a-z]+', text) #['this', 'is', 're', 'test']
03 String replacement/splitting
Replace:sub/subn
When the need for conditional replacement of text strings, you can call the implementation (of course, you can also compile and then call the example method), the corresponding parameters were pattern strings, replacement format, text strings, but also by increasing the default parameters to limit the number of replacements and match mode. By grouping in the pattern string, you can realize the string format replacement (similar to the string format method), in order to achieve a specific task.
import re text = 'today is 2020-03-05' print(('-', '', text)) #'today is 20200305' print(('-', '', text, 1)) #'today is 202003-05' print(('(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1', text)) #'today is 03/05/2020'
One way of morphing is to distinguish between returning a 2-element tuple, where the first element is the result of the replacement and the second is the number of replacements
import re text = 'today is 2020-03-05' print(('-', '', text)) #('today is 20200305', 2)
Split: split
Regular expressions can also be called to achieve specific splits of strings, which is equivalent to an enhanced version of the .split() method that achieves a specific pattern of splits and returns a list of cut results
import re text = 'today is a re test, what do you mind?' print((',', text)) #['today is a re test', ' what do you mind?']
04 Summary
The re module in python provides common methods for regular expressions, each of which consists of either a class method call (e.g.) or an instance call of a pattern string ().
- Common matching functions: match/fullmatch
- Common search functions: search/findall/finditer
- Commonly used substitution functions: sub/subn
- Commonly used cut functions: split
- There are many other methods, but they are not very common, see the official documentation for details.
- In addition, python has a third-party regular expression library, regex, to choose from
to this article about a second to understand the python regular expression common function of the article is introduced to this, more related python regular expression common function of the contents of the search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!