SoFunction
Updated on 2025-03-04

How to build complex rules using combinations in Python

It is very troublesome to write a regular script and it is difficult to write and debug. You only need two functions to build complex rules using simple regular combinations:

For example, enter a string rule, you can use{name}Refer to the rules defined above:

# rules definition
rules = r'''
    protocol = http|https
    login_name = [^:@\r\n\t ]+
    login_pass = [^@\r\n\t ]+
    login = {login_name}(:{login_pass})?
    host = [^:/@\r\n\t ]+
    port = \d+
    optional_port = (?:[:]{port})?
    path = /[^\r\n\t ]*
    url = {protocol}://({login}[@])?{host}{optional_port}{path}?
'''

Then callregex_buildFunction, convert the above rules into a dictionary and output:

# expand patterns in a dictionary
m = regex_build(rules, capture = True)

# list generated patterns
for k, v in (): 
    print(k, '=', v)

result:

protocol = (?P<protocol>http|https)
login_name = (?P<login_name>[^:@\r\n\t ]+)
login_pass = (?P<login_pass>[^@\r\n\t ]+)
login = (?P<login>(?P<login_name>[^:@\r\n\t ]+)(:(?P<login_pass>[^@\r\n\t ]+))?)
host = (?P<host>[^:/@\r\n\t ]+)
port = (?P<port>\d+)
optional_port = (?P<optional_port>(?:[:](?P<port>\d+))?)
path = (?P<path>/[^\r\n\t ]*)
url = (?P<url>(?P<protocol>http|https)://((?P<login>(?P<login_name>[^:@\r\n\t ]+)(:(?P<login_pass>[^@\r\n\t ]+))?)[@])?(?P<host>[^:/@\r\n\t ]+)(?P<optional_port>(?:[:](?P<port>\d+))?)(?P<path>/[^\r\n\t ]*)?)

It is difficult to write such complex rules directly by handwriting, and it is difficult to debug. If you build rules in combination, you can test small simple rules in advance and assemble them when you want to use them, so it is not easy to make mistakes. The above is the result of assembly and replacement.

Here is the rule of url:

# Use the rule "url" to matchpattern = m['url']
s = (pattern, 'https://name:pass@:8080/haha')

# Print the complete match resultprint('matched: "%s"'%(0))
print()

# Print group matching resultsfor name in ('url', 'login_name', 'login_pass', 'host', 'port', 'path'):
    print('subgroup:', name, '=', (name))

Output:

match text with pattern "url"
matched: "https://name:pass@:8080/haha"

subgroup: url = https://name:pass@:8080/haha
subgroup: login_name = name
subgroup: login_pass = pass
subgroup: host =
subgroup: port = 8080
subgroup: path = /haha

You can get the complete result, or you can obtain the specific component in it according to the rule name to get the matching result.

This is now easy to write complex regular expressions.

In Python regular expressions{xxx}It is used to represent length, and there are numbers inside. If there are variable names inside, it will not conflict with the original rules, so this writing method is safe.

Implementation code:

import re

# Replace text in pattern like {name} with predefined rules in macrosdef regex_expand(macros, pattern, guarded = True):
    output = []
    pos = 0
    size = len(pattern)
    while pos &lt; size:
        ch = pattern[pos]
        if ch == '\\':
            (pattern[pos:pos + 2])
            pos += 2
            continue
        elif ch != '{':
            (ch)
            pos += 1
            continue
        p2 = ('}', pos)
        if p2 &lt; 0:
            (ch)
            pos += 1
            continue
        p3 = p2 + 1
        name = pattern[pos + 1:p2].strip('\r\n\t ')
        if name == '':
            (pattern[pos:p3])
            pos = p3
            continue
        elif name[0].isdigit():
            (pattern[pos:p3])
            pos = p3
            continue
        elif ('&lt;' in name) or ('&gt;' in name):
            raise ValueError('invalid pattern name "%s"'%name)
        if name not in macros:
            raise ValueError('{%s} is undefined'%name)
        if guarded:
            ('(?:' + macros[name] + ')')
        else:
            (macros[name])
        pos = p3
    return ''.join(output)

# Given rule text, build rule dictionarydef regex_build(code, macros = None, capture = True):
    defined = {}
    if macros is not None:
        for k, v in ():
            defined[k] = v
    line_num = 0
    for line in ('\n'):
        line_num += 1
        line = ('\r\n\t ')
        if (not line) or ('#'):
            continue
        pos = ('=')
        if pos &lt; 0:
            raise ValueError('%d: not a valid rule'%line_num)
        head = line[:pos].strip('\r\n\t ')
        body = line[pos + 1:].strip('\r\n\t ')
        if (not head):
            raise ValueError('%d: empty rule name'%line_num)
        elif head[0].isdigit():
            raise ValueError('%d: invalid rule name "%s"'%(line_num, head))
        elif ('&lt;' in head) or ('&gt;' in head):
            raise ValueError('%d: invalid rule name "%s"'%(line_num, head))
        try:
            pattern = regex_expand(defined, body, guarded = not capture)
        except ValueError as e:
            raise ValueError('%d: %s'%(line_num, str(e)))
        try:
            (pattern)
        except :
            raise ValueError('%d: invalid pattern "%s"'%(line_num, pattern))
        if not capture:
            defined[head] = pattern
        else:
            defined[head] = '(?P&lt;%s&gt;%s)'%(head, pattern)
    return defined

# Define a set of combination rulesrules = r'''
    protocol = http|https
    login_name = [^:@\r\n\t ]+
    login_pass = [^@\r\n\t ]+
    login = {login_name}(:{login_pass})?
    host = [^:/@\r\n\t ]+
    port = \d+
    optional_port = (?:[:]{port})?
    path = /[^\r\n\t ]*
    url = {protocol}://({login}[@])?{host}{optional_port}{path}?
'''

# Expand the above rules into a dictionarym = regex_build(rules, capture = True)

# Output dictionary contentfor k, v in (): 
    print(k, '=', v)

print()

# Use the final rule "url" to match textpattern = m['url']
s = (pattern, 'https://name:pass@:8080/haha')

# Print the complete matchprint('matched: "%s"'%(0))
print()

# Print group matches by namefor name in ('url', 'login_name', 'login_pass', 'host', 'port', 'path'):
    print('subgroup:', name, '=', (name))

Finished, the main logic is 84 lines of code.

This is the end of this article about how Python uses combination to build complex rules. For more related content on Python to build complex rules, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!