SoFunction
Updated on 2025-04-08

String library and powerful pattern matching learning notes in Lua

The lua native interpreter has very limited processing capabilities for strings, and its powerful string operation capabilities come from the string library. The string function of lua is exported in the string module. In lua5.1, it is also a member method of the string type. Therefore, we can write it as (s) or s:upper() and choose the writing method you like.

(s) Returns the length of s.
(s, n) Returns a string that repeats the s string n times.
(s) Return a copy of the string s that has converted uppercase to lowercase
lower and upper both use local character sets. In addition, if you want to sort a string array, it is not case sensitive, you might write it like this:

Copy the codeThe code is as follows:

(a, function(a, b)
     return (a) < (b)
end)

(s, i, j) will extract a string from s, from i to j (closed interval [i, j]). Of course, you can use a negative index value and calculate from the end of the string, -1 is the last character, -2 is the second to last, etc. The advantage of this is that when we want to extract the last few characters, it is very convenient to count from the latter. For example:
Copy the codeThe code is as follows:

s = "[hello,world]"
print((s, 2, -2)) --> hello,world

Remember, strings in lua are immutable.

, used to convert the value between characters and numbers. For example:

Copy the codeThe code is as follows:

i = 97
print((i, i+1, i+2)) --> abc
print(("abc"))       --> 97
print(("abc"), -2)   --> 98

lua5.1 can accept the third parameter and return multiple values ​​between i and j. For example, write this to convert a string into a character value array:

Copy the codeThe code is as follows:

t = {(1, -1}

To re-convert to string:
Copy the codeThe code is as follows:

(unpack(t))

It is a powerful string formatting function, similar to printf in C language, and it will not be described here.

The most powerful functions in the lua string library are those pattern matching functions: find, match, gsub, gmatch. Unlike other scripting languages, lua does not use POSIX regular expressions, nor does it use perl regular expressions. The reason is that implementing these causes lua to occupy more memory, and lua's original intention is to be a compact, embedded language for application. lua uses less than 500 lines of code to achieve its own set of pattern matching. Although it is not as strong as standard regular expressions (generally more than 4,000 code is required), it is also powerful enough.

The location where the target template appears in the given string will be found, the starting and end positions will be returned, and nil will not be found. For example:

Copy the codeThe code is as follows:

s = "hello,world"
i, j = (s, "hello")
print((s, i, j))

Of course, you can also give the starting search position, which is useful when you want to find out all the occurrences, such as where newlines appear:
Copy the codeThe code is as follows:

local t = {}
local i = 0
while true do
     i = (s, "\n", i+1)
     if i == nil then break end
     t[#t+1] = i
end

Similarly, both look for the corresponding pattern in the specified string. The difference is that what he returns is the part of the string he found:

Copy the codeThe code is as follows:

print(("hello,world","hello")) --> hello

For fixed patterns like "hello", this function makes no sense. But for variable mode, it shows its powerful power:
Copy the codeThe code is as follows:

date = "now is 2014/10/6 17:58"
d = (date, "%d+/%d+/%d+")
print(d)   --> 2014/10/6

There are three parameters, given string, match pattern, and replace string. The purpose is to replace all the matching pattern with alternative strings. and return the replaced string and the number of replacements.
Copy the codeThe code is as follows:

s = ("Lua is cute", "cute", "great")
print(s) --> Lua is great

The function returns an iterator that iterates over all matching strings that appear in the given string.

model:

Character classes: (character classes)

Copy the codeThe code is as follows:

. all characters
%a letters
%c control characters
%d digits
%l lower -case letters
%p punctuation characters
%s space characters
%u upper-case letters
%w alphanumeric characters
%x hexadecimal digits
%z the character whose representation is 0

Their capital version is his own complement.
Magical Characters:
Copy the codeThe code is as follows:

( ) . % + - * ? [ ] ^ $

Escape with %. '%%'represent'%'

Character set (char -set): Use character set to customize character classes.

1. Different character classes, and single characters are used []
[%w_] matches alphanumeric characters and underscores.
[01] Match binary numbers
2. To include character intervals in the character set, add - between the start and end
[0-9] Equivalent to %d
[0-9a-fA-F] is equivalent to %x
3. If you want to get complementary to this character set, add ^ before it
[^0-7] Any non-octal number

Repeat or optional modifier

Copy the codeThe code is as follows:

+ 1 or more repetitions, match the longest,
* 0 or more repetitions  The longest
- also 0 or more repetitions Shortest
? optional (0 or 1 occurrence)

capture

The capture mechanism allows a portion of a pattern string to match a portion of the target string. The writing method is to enclose the part of the pattern string that you need to capture with (), for example:

Copy the codeThe code is as follows:

pair = "name = anna"
key, value = (pair, "(%a+)%s*=%s*(%a+)")
print(key, value) --> name anna

We can also use the capture for the pattern string itself, "([\"'])(.-)%1", where %1 means matching the first captured copy.

replace

As we already know, the parameter can be a string, in fact, it can be a function or a table. If it is a function, the function will be called with the captured content as a parameter, and the returned content will be used as a replacement string. If it is a table, the captured content is used as the key to take the table's value as the replacement string. If it does not exist, no replacement will be done. like:

Copy the codeThe code is as follows:

function expand(s)
     return (s, "$(%w+)", _G)
end
name = "Lua"; status = "great"
print(expand("$name is $status, isn't it?"))

(over)