SoFunction
Updated on 2025-04-07

Lua performance optimization tips (IV): About strings

Similar to tables, understanding how Lua implements strings can allow you to use it more efficiently.

Lua implements strings differently than the two main ways that most other scripting languages ​​use. First, all strings in Lua are internalized [1], meaning Lua maintains a single copy of any string. When a new string appears, Lua checks for an ready-made copy, and if so, reuse it. Internalization makes operations such as string comparison and index tables very fast, but slows down the creation of strings.

Second, variables in Lua never store strings, they just reference them. This implementation can speed up many string operations. For example, in Perl, when you write a code similar to $x=$y and $y is a string, the assignment operation will copy the content of the string from the $y buffer to the $x buffer. If the string is very long, the overhead of this operation is very high. In Lua, this assignment is just a copy of the pointer.

However, this reference implementation will slow down the speed of string concatenation in a particular way. In Perl, the operation $s = $s . "x" and $s . = "x" are very different. For the former, you get a copy of $s and append "x" to its tail; for the latter, "x" is simply appended to the tail of the internal buffer maintained by $s. Therefore, the latter has no regard to the length of the string (assuming the buffer is enough to put the appended text). If these two sentences of code are put into a loop, the difference between them is the difference between linear and quadratic algorithms. For example, the following loop takes about five minutes to read a 5MB file:

Copy the codeThe code is as follows:

$x = "";
while (<>)
{
    $x = $x . $_;
}

If we put
Copy the codeThe code is as follows:

$x = $x . $_

Change to
Copy the codeThe code is as follows:

$x .= $_

The time will be reduced to 0.1 seconds!

Lua does not provide a second, that is, a faster way, because its variables have no internal buffers. So we need an explicit buffer: a table containing string fragments to do the job. The following loop reads the same 5MB file, which takes 0.28 seconds. Although it is not as fast as Perl, it is pretty good:

Copy the codeThe code is as follows:

local t = {}
for line in () do
    t[#t + 1] = line
end
s = (t, "\n")

[1] Internalize, original text internalize