preamble
When we do interface automation, we usually use regular expressions to extract relevant data when dealing with interface-dependent data.
Regular Expression, also known as Regular Expression, Regular Representation, Regular Expression, Rule Expression, Regular Expression (often abbreviated as regex, regexp or RE in code) . It is a special sequence of characters that helps you to easily check whether a string matches a certain pattern. In many text editors, regular expressions are often used to retrieve and replace text that matches a certain pattern. Python, however, has added the re module since version 1.5, which provides Perl-style regular expression patterns.
I. Regular expression syntax
1.1 indicates a single character
Single Character: i.e., it means a single character, e.g., to match a number, use \d, to match a non-number, use \D.
In addition to the following syntax, it is also possible to match the specified specific characters, either one or more than one.
character |
Functional Description |
. |
Matches any 1 character (except \n) |
[2a] |
Match the characters listed in [] brackets, e.g. here it matches one of the two characters 2 or a. |
\d |
Matching numbers, i.e. 0-9 |
\D |
Match non-numeric |
\s |
Match whitespace, i.e. space, tab key (tab key is two spaces) |
\S |
Match non-blank |
\w |
Matches word characters, i.e. a-z, A-Z, 0-9, _ (numbers, letters, underscores) |
\W |
Match non-word characters |
Examples of the following, here first explain findall (match rules, to match the string) this method is to find all the matching data to the form of a list of return, later will be in the re module for details:
import re # . : matches any 1 character re1 = r'.' res1 = (re1, '\nj8?0\nbth\nihb') print(res1) # Run result: ['j', '8', '?' , '0', 'b', 't', 'h', 'i', 'h', 'b'] # []: match one of the enumerations re2 = r"[abc]" res2 = (re2, '1iugfiSHOIFUOFGIDHFGFD2345a6a78b99cc') print(res2) # Run result: ['a', 'a', 'b', 'c', 'c'] # \d:match a number re3 = r"\d" res3 = (re3, "dfghjkl32212dfghjk") print(res3) # Run result: ['3', '2', '2', '1', '2'] # \D: Match a non-number re4 = r"\D" res4 = (re4, "d212dk?\n$%3;]a") print(res4) # Run result: ['d', 'd', 'k', '?' , '\n', '$', '%', ';', ']', 'a'] # \s: match a blank key or tab key (tab key is actually two blank keys) re5 = r"\s" res5 = (re5,"a s d a 9999") print(res5) # Run result: [' ', ' ', ' ', ' ', ' ', ' '] # \S: Match non-blank keys re6 = r"\S" res6 = (re6, "a s d a 9999") print(res6) # Run result: ['a', 's', 'd', 'a', '9', '9', '9', '9'] # \w: matches a word character (number, letter, underscore) re7 = r"\w" res7 = (re7, "ce12sd@#a as_#$") print(res7) # Run result: ['c', 'e', '1', '2', 's', 'd', 'a', 'a', 's', '_'] # \W: matches a non-word character (not a number, letter, underscore) re8 = r"\W" res8 = (re8, "ce12sd@#a as_#$") print(res8) # Run result: ['@', '#', ' ', '#', '$'] # Matches the specified character re9 = r"python" res9 = (re9, "cepy1thon12spython123@@python") print(res9) # running result:['python', 'python']
1.2 Indication of quantity
If you want to match a character more than once, you can indicate this by adding the number after the character, the rules are as follows:
character |
Functional Description |
* |
Match 0 or unlimited occurrences of the previous character, i.e., can be present or absent |
+ |
Matches 1 or unlimited occurrences of the previous character, i.e., at least 1 time |
? |
Matches 0 or 1 occurrences of the previous character, i.e. either none or only 1 occurrence |
{m} |
Match m occurrences of the previous character |
{m,} |
Match the previous character at least m times |
{m,n} |
Match the previous character occurs from m to n times. |
Examples are as follows:
import re # *: indicates that the previous character occurs more than 0 times (including 0 times) re21 = r"\d*" # Here's the rule for matching, the first character is a number # res21 = (re21, "343aa1112df345g1h6699") # If a is matched, it is a match 0 times, but it will be null because there is no value. print(res21) # Run results: ['343', '', '', '1112', '', '', '345', '', '1', '', '6699', ''] # ? :: Indicates 0 times or one time re22 = r"\d?" res22 = (re22, "3@43*a111") print(res22) # Run results: ['3', '', '4', '3', '', '', '1', '1', '1', ''] # {m}: means match a character m times re23 = r"1[3456789]\d{9}" # Cell phone number: 1st digit is 1, 2nd digit matches 1 of the listed numbers, 3rd digit starts with a number and matches 9 times res23 = (re23,"sas13566778899fgh256912345678jkghj12788990000aaa113588889999") print(res23) # Run result: ['13566778899', '13588889999'] # {m,}: means match a character at least m times re24 = r"\d{7,}" res24 = (re24, "sas12356fgh1234567jkghj12788990000aaa113588889999") print(res24) # Run result: ['1234567', '12788990000', '113588889999'] # {m,n}: matches a character from m to n times. re25 = r"\d{3,5}" res25 = (re25, "aaaaa123456ghj333yyy77iii88jj909768876") print(res25) # running result:['12345', '333', '90976', '8876']
1.2.1 Match grouping
character |
Functional Description |
| |
Match any of the left and right expressions |
(ab) |
Use bracketed characters as a group |
Examples are as follows:
import re # Define multiple rules at the same time, as long as one of them is met re31 = r"13566778899|13534563456|14788990000" res31 = (re31, "sas13566778899fgh13534563456jkghj14788990000") print(res31) # Run result: ['13566778899', '13534563456', '14788990000'] # (): match grouping: extract the data in parentheses from the data that matches the rule re32 = r"aa(\d{3})bb" # How the data conforms to the rule, the result will only take the data in parentheses, i.e., \d{3} res32 = (re32, "ggghjkaa123bbhhaa672bbjhjjaa@45bb") print(res32) # running result:['123', '672']
1.3 Representation of boundaries
character |
Functional Description |
^ |
Match the beginning of the string, only the beginning of the string can be matched |
$ |
Matches the end of a string, only the end |
\b |
Match a word's boundaries (words: letters, numbers, underscores) |
\B |
Match non-word boundaries |
Examples are as follows:
import re # ^:matches the beginning of the string re41 = r"^python" # string starts with python res41 = (re41, "python999python") # will only match the beginning of this string res411 = (re41, "1python999python") # Because it starts with a 1, the first digit doesn't match # print(res41) # Runs: ['python'] print(res411) # Running results: [] # $:matches the end of the string re42=r"python$" # strings ending in python res42 = (re42, "python999python") print(res42) # Runs: ['python'] # \b: matches word boundaries, words i.e.: letters, numbers, underscores re43 = r"\bpython" # i.e. matches python and the first digit of python is not a word res43 = (re43, "1python 999 python") # Here the first 1 bit of the 1st python is the word, so the 1st one is a no-go print(res43) # Runs: ['python'] # \B: Match non-word boundaries re44 = r"\Bpython" # i.e. matches python and the first digit of python is the word res44 = (re44, "1python999python") print(res44) # running result:['python', 'python']
II. The greed model
Quantifiers in python are greedy by default, always trying to match as many characters as possible, whereas non-greedy mode tries to match as few characters as possible, and adding a question mark (?) after an expression indicating a quantity You can turn off greedy mode by adding a question mark (?) to the expression representing the quantity.
The following example, match more than 2 numbers, if it meets the conditions it will continue to match until it does not meet before stopping, such as one of the 34656fya, 34656 meets more than 2 numbers, then it will continue to match up to 6 until the end of the day, if you turn off the greedy mode, then in the case of meeting the 2 numbers will stop, and finally you can match up to 34, 65.
import re # In default greedy mode test = 'aa123aaaa34656fyaa12a123d' res = (r'\d{2,}', test) print(res) # Run result: ['123', '34656', '12', '123'] # Turn off the greedy mode res2 = (r'\d{2,}?', test) print(res2) # running result:['12', '34', '65', '12', '12']
iii. re module
When using regular expressions in python, the re module is used to do so, and the methods provided generally require two arguments to be passed:
- Parameter 1: Matching rules
- Parameter 2: String to be matched
3.1 ()
Finds all strings that match the specification and returns them as a list.
import re test = 'aa123aaaa34656fyaa12a123d' res = (r'\d{2,}', test) print(res) # running result:['123', '34656', '12', '123']
3.()
Find the first string that matches the specification, the return is a match object, you can use group() to extract the matched data directly.
import re s = "123abc123aaa123bbb888ccc" res2 = (r'123', s) print(res2) # Run result: < object; span=(0, 3), match='123'> # Extract the matched data by group, return type str print(()) # running result:123
In the returned match object, span is the subscript range of the matched data and match is the matched value.
Description of group() parameters:
- No parameter: get all the content matched
- Incoming value: can be specified by a parameter to get the contents of the first grouping (to get the first grouping, pass parameter 1, to get the second grouping, pass parameter 2, and so on).
import re s = "123abc123aaa123bbb888ccc" re4 = r"aaa(\d{3})bbb(\d{3})ccc" # Here grouping is the match syntax described earlier: () res4 = (re4, s) print(res4) # group passes no parameters: it gets all the content it matches. # group Specify, by parameter, which group to get the contents of (for group 1, pass parameter 1, for group 2, pass parameter 2, and so on)... print(()) print((1)) print((2))
3.3 ()
Match from the beginning of the string, match successfully then return to the object matched, if the beginning of the position does not meet the rules of matching, will not continue to match later, directly return None. () and () are only one match, the difference is that the former only matches the beginning of the string, the latter will match the entire string, but only to get the first data in line with the the first match.
import re s = "a123abc123aaa1234bbb888ccc" # match: matches only the beginning of the string, returns None if the beginning does not match. res1 = (r"a123", s) res2 = (r"a1234", s) print(res1) # Run result: < object; span=(0, 4), match='a123'> print(res2) # running result:None
3.()
Retrieve and Replace: used to replace matches in a string
() Parameter description:
- Parameter 1: String to be replaced
- Parameter 2: Target string
- Parameter 3: String to be replaced
- Parameter 4: you can specify the maximum number of substitutions, non-required (default substitution of all strings that meet the specification)
import re s = "a123abc123aaa123bbb888ccc" # <font color="#FF0000">Parameter 1: </font>The string to be replaced. # <font color="#FF0000"> Parameter 2: </font> target string # <font color="#FF0000">Parameter 3: </font>The string to be replaced. # <font color="#FF0000"> parameter 4: </font> you can specify the maximum number of times to replace, non-required (the default replacement of all strings that meet the specifications) res5 = (r'123', "666", s, 4) print(res5) # running result:a666abc666aaa666bbb888ccc
IV. Parameterization of use cases
In interface automation testing, our test data are saved in excel, some parameters if you write a dead data, may change a scene or change the environment can not be used, then switching the environment will need to first prepare the test data of the new environment, and can support to run our scripts, or modify the excel data to fit the new environment The cost of maintenance is high. Therefore, we need to parameterize our automation script test data as much as possible to reduce maintenance costs.
Let's look at the simple version of the parameterization, to log in as an example, log in with the account number, password and other information can be extracted and put into the configuration file, modify the data or change the environment directly in the configuration file to modify the uniform can be.
But if there are a number of different data need to parameterize it, each parameter is added to a judgment to replace the data? Such code is both verbose and bad maintenance, then re module can be used, directly look at an example:
import re from import conf class TestData: """Used to temporarily save some data to be replaced.""" pass def replace_data(data): r = r"#(. +?) #" # Note the contents of this grouping() # Determine if there is data to be replaced while (r, data): res = (r, data) # Match the first data to be replaced item = () # Extract the content of the data to be replaced key = (1) # Get the data items in the content to be replaced try: # According to the data items in the replacement content to find the corresponding content in the configuration file to replace data = (item, conf.get_str("test_data", key)) except: # If you can't find it in the configuration file look for it in the temporary saved data and replace it data = (item, getattr(TestData, key)) return data
Note that the regular expression here has the use of ? to turn off the greedy mode, because the test data may need to be parameterized with 2 or more data, and if you don't turn off the greedy mode, it will only be able to match one piece of data, as shown in the example below:
import re data = '{"mobile_phone":"#phone#","pwd":"#pwd#","user":#user#}' r1 = "#(.+)#" res1 = (r1, data) print(res1) # Run result: ['phone#", "pwd": "#pwd#", "user":#user'] Note that there is only one piece of data in single quotes here print(len(res1)) # of runs: 1 r2 = "#(.+?)#" res2 = (r2, data) print(res2) # Run result: ['phone', 'pwd', 'user'] print(len(res2)) # running result:3
Another class used to temporarily save data, here mainly used to save the data returned by the interface, because some test data is dynamically changing, may be dependent on a certain interface, and the later test cases need these data, then we can save the interface return to this class as a class attribute, and then in the need to use the data of the Then when we need to use this data in the test cases, we can extract this class attribute and replace it with the test data. Tip: Set the attribute setattr(object, attribute name, attribute value), get the attribute value getattr(object, attribute name).
summarize
to this article on python interface automation of regular use case parameterization of the article is introduced to this, more related python regular use case parameterization content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!