8 super simple ways to generate random strings to share

Preface

This isiczaA great post on *answer, the quality is very high. Translate it and learn together

The question is: Is there any fastest and easiest way to generate random strings containing only English letters in Go language?

icza has given 8 solutions. The simplest method is not the fastest method. They each have their own advantages and disadvantages. The performance test results are attached at the end:

1. Runes

A simple answer is to declare a rune array, select rune characters through random numbers, and splice them into results

package approach1

import (
    "fmt"
    "math/rand"
    "testing"
    "time"
)

var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

func randStr(n int) string {
    b := make([]rune, n)
    for i := range b {
        b[i] = letters[(len(letters))]
    }
    return string(b)
}

func TestApproach1(t *) {
    (().UnixNano())
    (randStr(10))
}

func BenchmarkApproach1(b *) {
    (().UnixNano())
    for i := 0; i < ; i++ {
        _ = randStr(10)
    }
}

2. Bytes

If the randomly selected characters only contain English letters, we can directly use bytes, because in UTF-8 encoding mode, English characters and Bytes are one-to-one (Go uses UTF-8 encoding)

So you can

var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

Use this instead

var letters = []byte("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

Or better

const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

Now we have made great progress, we have turned it into a constant. In Go, there is only a string constant, but there is no slice constant. Additional gains, expressionslen(letters)Also becomes a constant (ifsis a constant, thenlen(s)Will be constant too)

We didn't pay any code, nowlettersYou can access the bytes in it through the subscript, which is exactly what we need.

package approach2

import (
    "fmt"
    "math/rand"
    "testing"
    "time"
)

const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func randStr(n int) string {
    b := make([]byte, n)
    for i := range b {
        b[i] = letters[(len(letters))]
    }
    return string(b)
}

func TestApproach2(t *) {
    (().UnixNano())

    (randStr(10))
}

func BenchmarkApproach2(b *) {
    (().UnixNano())
    for i := 0; i < ; i++ {
        _ = randStr(10)
    }
}

3. Remainder

The above solution is passed()To get a random letter, this method is called at the bottom(), and then calledRand.Int31n()

Compared to the function that generates 63 random bitsrand.Int63()To be honest,Rand.Int31n()Very slow.

We can simply callrand.Int63()Then dividelen(letterBytes), use its remainder to generate letters

package approach3

import (
    "fmt"
    "math/rand"
    "testing"
    "time"
)

const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func randStr(n int) string {
    b := make([]byte, n)
    for i := range b {
        b[i] = letters[rand.Int63() % int64(len(letters))]
    }
    return string(b)
}

func TestApproach3(t *) {
    (().UnixNano())

    (randStr(10))
}

func BenchmarkApproach3(b *) {
    (().UnixNano())
    for i := 0; i < ; i++ {
        _ = randStr(10)
    }
}

This algorithm works fine and is very fast, but it sacrifices part of the accuracy, and the probability of letters appearing is not exactly the same (assumingrand.Int63()The 63-bit number is generated with equal probability). Since there are only 52 letters in total, which is much smaller than 1<<63 - 1, the distortion is very small, so in fact this is absolutely fine.

Explanation: Suppose you want a random number of 0~5. If you use a 3-bit bit, the probability of 3-bit bits will appear 0~7, so the probability of 0 and 1 is twice that of 2, 3, and 4. The probability of using 5-bit bits, 0 and 1 is6/32, the probability of 2, 3, and 4 appearing is5/32. It's closer now, right? By constantly increasing the bits, the smaller the gap will become. When you have 63 bits, the difference is no longer counted.

4. Masking mask

On the basis of the previous scheme, we maintain a uniform distribution by using only the lowest n bits of the random number, n denoting the number of all characters. For example, we have 52 letters and we need 6 digits (52 = 110100b). So we just usedrand.Int63()The last 6 digits. And, in order to keep all characters evenly distributed, we decided to accept only0..len(letterBytes)-1The number of 0~51. (Translator's note: There is no inaccuracy problem with the third plan here)

The lowest digits are greater than or equal tolen(letterBytes)The probability is generally smaller than0.5(Average is 0.25), which means that this happens, just try again. After retrying n times, the probability that we still need to discard this number is much less than the n power of 0.5 (this is the upper bound, and it will actually be lower than this value). Taking the 52 letters in this article as an example, the probability that the lowest 6 digits need to be discarded is only(64-52)/64=0.19. This means that the probability of repeating 10 times and still not having a number is 1*10^-8.

package approach4

import (
    "fmt"
    "math/rand"
    "testing"
    "time"
)

const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

const (
    // 6 bits to represent a letters index
    letterIdBits = 6
    // All 1-bits as many as letterIdBits
    letterIdMask = 1 <<letterIdBits - 1
)

func randStr(n int) string {
    b := make([]byte, n)
    for i := range b {
        if idx := int(rand.Int63() & letterIdMask); idx < len(letters) {
            b[i] = letters[idx]
            i++
        }
    }
    return string(b)
}

func TestApproach4(t *) {
    (().UnixNano())

    (randStr(10))
}

func BenchmarkApproach4(b *) {
    (().UnixNano())
    for i := 0; i < ; i++ {
        _ = randStr(10)
    }
}

5. Masking Improved

The solution in Section 4 is only usedrand.Int63()The last 6 bits of the 64 random bytes returned by the method. This is really a waste, becauserand.Int63()It is the most time-consuming part of our algorithm.

If we have 52 letters, 6 digits can generate a random string. So 63 random bytes can be used63/6=10Second-rate.

Translator's note: Cache is used, cachedrand.Int63()The content returned by the method is used 10 times, but it is no longer coroutine-safe.

package approach5

import (
    "fmt"
    "math/rand"
    "testing"
    "time"
)

const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

const (
    // 6 bits to represent a letter index
    letterIdBits = 6
    // All 1-bits as many as letterIdBits
    letterIdMask = 1<<letterIdBits - 1
    letterIdMax  = 63 / letterIdBits
)

func randStr(n int) string {
    b := make([]byte, n)
    // A rand.Int63() generates 63 random bits, enough for letterIdMax letters!
    for i, cache, remain := n-1, rand.Int63(), letterIdMax; i >= 0; {
        if remain == 0 {
            cache, remain = rand.Int63(), letterIdMax
        }
        if idx := int(cache & letterIdMask); idx < len(letters) {
            b[i] = letters[idx]
            i--
        }
        cache >>= letterIdBits
        remain--
    }
    return string(b)
}

func TestApproach5(t *) {
    (().UnixNano())

    (randStr(10))
}

func BenchmarkApproach5(b *) {
    (().UnixNano())
    for i := 0; i < ; i++ {
        _ = randStr(10)
    }
}

6. Source

The fifth solution is very good, and there are not many points that can be improved. We can but not be complicated.

Let's find the points that can be improved: the source of random numbers

crypto/randPackage providedRead(b []byte)The function can obtain the number of random bits required through this function, and only requires one call. But it doesn't improve performance, becausecrypto/randA cryptographically secure pseudo-random number is implemented, so the speed is relatively slow.

So let's stick with itmath/randBag,useAs the source of random bits,It's a statementInt63() int64Interface: It is the only way we need and use in the latest solutions.

So we don't really need it，The bag is enough for us

package approach6

import (
    "fmt"
    "math/rand"
    "testing"
    "time"
)

const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

var src = (().UnixNano())

const (
    // 6 bits to represent a letter index
    letterIdBits = 6
    // All 1-bits as many as letterIdBits
    letterIdMask = 1<<letterIdBits - 1
    letterIdMax  = 63 / letterIdBits
)

func randStr(n int) string {
    b := make([]byte, n)
    // A rand.Int63() generates 63 random bits, enough for letterIdMax letters!
    for i, cache, remain := n-1, src.Int63(), letterIdMax; i >= 0; {
        if remain == 0 {
            cache, remain = src.Int63(), letterIdMax
        }
        if idx := int(cache & letterIdMask); idx < len(letters) {
            b[i] = letters[idx]
            i--
        }
        cache >>= letterIdBits
        remain--
    }
    return string(b)
}

func TestApproach6(t *) {
    (randStr(10))
}

func BenchmarkApproach6(b *) {
    for i := 0; i < ; i++ {
        _ = randStr(10)
    }
}

Note that we did not use seed to initialize rand, instead we initialize it

Another thing to note,math/randThe documentation states

The default Source is coroutine-safe

So the default Source is better than()CreatedSourceBe slow. It is of course slow if you don’t need to deal with coroutine concurrency scenarios.

7. Use

Previous solutions all returned strings constructed through slices. The last conversion is copied once, because the string is immutable. If the conversion is not copied, it is impossible to ensure that the string can remain unchanged after the conversion is completed.

Go1.10 introduced, which is a new type, and similar to that used to construct strings. Low-level use[]byteTo construct content, it is exactly what we are doing now, and we can finally use it()Method to get the final string value. But the cool thing about it is that it does this without having to do the copy just mentioned. It dares to do this because it is constructed under the[]byteIt has never been exposed, so it is still guaranteed that no one can unintentionally or maliciously modify the immutable string that has been generated.

So our next idea is not to build a random string in slice, but to use , after the building is finished we can get and return the result without copying. This may help in terms of speed, and certainly in terms of memory usage and allocation (Translator's note: you'll see clearly in benchmark).

package approach7

import (
    "fmt"
    "math/rand"
    "strings"
    "testing"
    "time"
)

const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

var src = (().UnixNano())

const (
    // 6 bits to represent a letter index
    letterIdBits = 6
    // All 1-bits as many as letterIdBits
    letterIdMask = 1<<letterIdBits - 1
    letterIdMax  = 63 / letterIdBits
)

func randStr(n int) string {
    sb := {}
    (n)
    // A rand.Int63() generates 63 random bits, enough for letterIdMax letters!
    for i, cache, remain := n-1, src.Int63(), letterIdMax; i >= 0; {
        if remain == 0 {
            cache, remain = src.Int63(), letterIdMax
        }
        if idx := int(cache & letterIdMask); idx < len(letters) {
            (letters[idx])
            i--
        }
        cache >>= letterIdBits
        remain--
    }
    return ()
}

func TestApproach7(t *) {
    (randStr(10))
}

func BenchmarkApproach7(b *) {
    for i := 0; i < ; i++ {
        _ = randStr(10)
    }
}

After constructing the builder, we immediately called it()Method, so that it allocates a large enough underlying slice to avoid allocation in subsequent operations

8. “Mimicing” with package unsafe

Imitation using unsafe package

Just like our sixth section, we use it[]byteto build strings. Switching to a slice might be some too heavy, we just want to avoid copying it.

useunsafePackage to avoid final copy

// String returns the accumulated string.
func (b *Builder) String() string {
    return *(*string)((&))
}

We can also complete this process by ourselves. So the idea is that we passunsafePackage returns a string to avoid copying

package approach8

import (
    "fmt"
    "math/rand"
    "testing"
    "time"
    "unsafe"
)

const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

var src = (().UnixNano())

const (
    // 6 bits to represent a letter index
    letterIdBits = 6
    // All 1-bits as many as letterIdBits
    letterIdMask = 1<<letterIdBits - 1
    letterIdMax  = 63 / letterIdBits
)

func randStr(n int) string {
    b := make([]byte, n)
    // A rand.Int63() generates 63 random bits, enough for letterIdMax letters!
    for i, cache, remain := n-1, src.Int63(), letterIdMax; i >= 0; {
        if remain == 0 {
            cache, remain = src.Int63(), letterIdMax
        }
        if idx := int(cache & letterIdMask); idx < len(letters) {
            b[i] = letters[idx]
            i--
        }
        cache >>= letterIdBits
        remain--
    }
    return *(*string)((&b))
}

func TestApproach8(t *) {
    (randStr(10))
}

func BenchmarkApproach8(b *) {
    for i := 0; i < ; i++ {
        _ = randStr(10)
    }
}

Benchmark

go test ./... -bench=. -benchmem

Data tested by the original author

(Translator's note: The third column represents how many nanoseconds it takes to operate at one time)

BenchmarkRunes-4 2000000 723 ns/op 96 B/op 2 allocs/op
BenchmarkBytes-4 3000000 550 ns/op 32 B/op 2 allocs/op
BenchmarkBytesRmndr-4 3000000 438 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMask-4 3000000 534 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMaskImpr-4 10000000 176 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMaskImprSrc-4 10000000 139 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMaskImprSrcSB-4 10000000 134 ns/op 16 B/op 1 allocs/op
BenchmarkBytesMaskImprSrcUnsafe-4 10000000 115 ns/op 16 B/op 1 allocs/op

Translator test data

BenchmarkApproach1-12 3849038 299.5 ns/op 64 B/op 2 allocs/op
BenchmarkApproach2-12 5545350 216.4 ns/op 32 B/op 2 allocs/op
BenchmarkApproach3-12 7003654 169.7 ns/op 32 B/op 2 allocs/op
BenchmarkApproach4-12 7164259 168.7 ns/op 32 B/op 2 allocs/op
BenchmarkApproach5-12 13205474 89.06 ns/op 32 B/op 2 allocs/op
BenchmarkApproach6-12 13665636 84.41 ns/op 32 B/op 2 allocs/op
BenchmarkApproach7-12 17213431 70.37 ns/op 16 B/op 1 allocs/op
BenchmarkApproach8-12 19756956 61.41 ns/op 16 B/op 1 allocs/op

The data that has been released now has changed some changes when it comes to the original author, but it does not prevent us from seeing the trends of various methods:

Just switching rune to byte, you will get a significant performance improvement (greater than 20%)
userand.Int63()replace()Also achieved a significant improvement (greater than 20 percent)
Using Masking does not improve performance. On the contrary, the performance has been reduced from where the original author has been
But oncerand.Int63()After all the returned characters, the performance has been improved by 3 times.
useAlternative, performance improvement by 21%
use, We increased the speed by 3.5%, and reduced the original two memory allocations to one time!
useunsafePackage instead, performance improvement by 14%

Comparing the eighth plan with the first plan, the eighth plan is 6.3 times faster than the first plan, using only one-sixth of the memory, and the number of allocations is only half of the original one.

This is the end of this article about 8 super simple ways to generate random strings for Go. For more related contents for generating random strings, please search for my previous articles or continue browsing the following related articles. I hope everyone will support me in the future!