Preface
This isiczaA great post on *answer, the quality is very high. Translate it and learn together
The question is: Is there any fastest and easiest way to generate random strings containing only English letters in Go language?
icza has given 8 solutions. The simplest method is not the fastest method. They each have their own advantages and disadvantages. The performance test results are attached at the end:
1. Runes
A simple answer is to declare a rune array, select rune characters through random numbers, and splice them into results
package approach1 import ( "fmt" "math/rand" "testing" "time" ) var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ") func randStr(n int) string { b := make([]rune, n) for i := range b { b[i] = letters[(len(letters))] } return string(b) } func TestApproach1(t *) { (().UnixNano()) (randStr(10)) } func BenchmarkApproach1(b *) { (().UnixNano()) for i := 0; i < ; i++ { _ = randStr(10) } }
2. Bytes
If the randomly selected characters only contain English letters, we can directly use bytes, because in UTF-8 encoding mode, English characters and Bytes are one-to-one (Go uses UTF-8 encoding)
So you can
var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
Use this instead
var letters = []byte("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
Or better
const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Now we have made great progress, we have turned it into a constant. In Go, there is only a string constant, but there is no slice constant. Additional gains, expressionslen(letters)
Also becomes a constant (ifs
is a constant, thenlen(s)
Will be constant too)
We didn't pay any code, nowletters
You can access the bytes in it through the subscript, which is exactly what we need.
package approach2 import ( "fmt" "math/rand" "testing" "time" ) const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" func randStr(n int) string { b := make([]byte, n) for i := range b { b[i] = letters[(len(letters))] } return string(b) } func TestApproach2(t *) { (().UnixNano()) (randStr(10)) } func BenchmarkApproach2(b *) { (().UnixNano()) for i := 0; i < ; i++ { _ = randStr(10) } }
3. Remainder
The above solution is passed()
To get a random letter, this method is called at the bottom()
, and then calledRand.Int31n()
Compared to the function that generates 63 random bitsrand.Int63()
To be honest,Rand.Int31n()
Very slow.
We can simply callrand.Int63()
Then dividelen(letterBytes)
, use its remainder to generate letters
package approach3 import ( "fmt" "math/rand" "testing" "time" ) const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" func randStr(n int) string { b := make([]byte, n) for i := range b { b[i] = letters[rand.Int63() % int64(len(letters))] } return string(b) } func TestApproach3(t *) { (().UnixNano()) (randStr(10)) } func BenchmarkApproach3(b *) { (().UnixNano()) for i := 0; i < ; i++ { _ = randStr(10) } }
This algorithm works fine and is very fast, but it sacrifices part of the accuracy, and the probability of letters appearing is not exactly the same (assumingrand.Int63()
The 63-bit number is generated with equal probability). Since there are only 52 letters in total, which is much smaller than 1<<63 - 1, the distortion is very small, so in fact this is absolutely fine.
Explanation: Suppose you want a random number of 0~5. If you use a 3-bit bit, the probability of 3-bit bits will appear 0~7, so the probability of 0 and 1 is twice that of 2, 3, and 4. The probability of using 5-bit bits, 0 and 1 is6/32
, the probability of 2, 3, and 4 appearing is5/32
. It's closer now, right? By constantly increasing the bits, the smaller the gap will become. When you have 63 bits, the difference is no longer counted.
4. Masking mask
On the basis of the previous scheme, we maintain a uniform distribution by using only the lowest n bits of the random number, n denoting the number of all characters. For example, we have 52 letters and we need 6 digits (52 = 110100b). So we just usedrand.Int63()
The last 6 digits. And, in order to keep all characters evenly distributed, we decided to accept only0..len(letterBytes)-1
The number of 0~51. (Translator's note: There is no inaccuracy problem with the third plan here)
The lowest digits are greater than or equal tolen(letterBytes)
The probability is generally smaller than0.5
(Average is 0.25), which means that this happens, just try again. After retrying n times, the probability that we still need to discard this number is much less than the n power of 0.5 (this is the upper bound, and it will actually be lower than this value). Taking the 52 letters in this article as an example, the probability that the lowest 6 digits need to be discarded is only(64-52)/64=0.19
. This means that the probability of repeating 10 times and still not having a number is 1*10^-8.
package approach4 import ( "fmt" "math/rand" "testing" "time" ) const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" const ( // 6 bits to represent a letters index letterIdBits = 6 // All 1-bits as many as letterIdBits letterIdMask = 1 <<letterIdBits - 1 ) func randStr(n int) string { b := make([]byte, n) for i := range b { if idx := int(rand.Int63() & letterIdMask); idx < len(letters) { b[i] = letters[idx] i++ } } return string(b) } func TestApproach4(t *) { (().UnixNano()) (randStr(10)) } func BenchmarkApproach4(b *) { (().UnixNano()) for i := 0; i < ; i++ { _ = randStr(10) } }
5. Masking Improved
The solution in Section 4 is only usedrand.Int63()
The last 6 bits of the 64 random bytes returned by the method. This is really a waste, becauserand.Int63()
It is the most time-consuming part of our algorithm.
If we have 52 letters, 6 digits can generate a random string. So 63 random bytes can be used63/6=10
Second-rate.
Translator's note: Cache is used, cachedrand.Int63()
The content returned by the method is used 10 times, but it is no longer coroutine-safe.
package approach5 import ( "fmt" "math/rand" "testing" "time" ) const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" const ( // 6 bits to represent a letter index letterIdBits = 6 // All 1-bits as many as letterIdBits letterIdMask = 1<<letterIdBits - 1 letterIdMax = 63 / letterIdBits ) func randStr(n int) string { b := make([]byte, n) // A rand.Int63() generates 63 random bits, enough for letterIdMax letters! for i, cache, remain := n-1, rand.Int63(), letterIdMax; i >= 0; { if remain == 0 { cache, remain = rand.Int63(), letterIdMax } if idx := int(cache & letterIdMask); idx < len(letters) { b[i] = letters[idx] i-- } cache >>= letterIdBits remain-- } return string(b) } func TestApproach5(t *) { (().UnixNano()) (randStr(10)) } func BenchmarkApproach5(b *) { (().UnixNano()) for i := 0; i < ; i++ { _ = randStr(10) } }
6. Source
The fifth solution is very good, and there are not many points that can be improved. We can but not be complicated.
Let's find the points that can be improved: the source of random numbers
crypto/rand
Package providedRead(b []byte)
The function can obtain the number of random bits required through this function, and only requires one call. But it doesn't improve performance, becausecrypto/rand
A cryptographically secure pseudo-random number is implemented, so the speed is relatively slow.
So let's stick with itmath/rand
Bag,use
As the source of random bits,
It's a statement
Int63() int64
Interface: It is the only way we need and use in the latest solutions.
So we don't really need it,
The bag is enough for us
package approach6 import ( "fmt" "math/rand" "testing" "time" ) const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" var src = (().UnixNano()) const ( // 6 bits to represent a letter index letterIdBits = 6 // All 1-bits as many as letterIdBits letterIdMask = 1<<letterIdBits - 1 letterIdMax = 63 / letterIdBits ) func randStr(n int) string { b := make([]byte, n) // A rand.Int63() generates 63 random bits, enough for letterIdMax letters! for i, cache, remain := n-1, src.Int63(), letterIdMax; i >= 0; { if remain == 0 { cache, remain = src.Int63(), letterIdMax } if idx := int(cache & letterIdMask); idx < len(letters) { b[i] = letters[idx] i-- } cache >>= letterIdBits remain-- } return string(b) } func TestApproach6(t *) { (randStr(10)) } func BenchmarkApproach6(b *) { for i := 0; i < ; i++ { _ = randStr(10) } }
Note that we did not use seed to initialize rand, instead we initialize it
Another thing to note,math/rand
The documentation states
The default Source is coroutine-safe
So the default Source is better than()
CreatedSource
Be slow. It is of course slow if you don’t need to deal with coroutine concurrency scenarios.
7. Use
Previous solutions all returned strings constructed through slices. The last conversion is copied once, because the string is immutable. If the conversion is not copied, it is impossible to ensure that the string can remain unchanged after the conversion is completed.
Go1.10 introduced, which is a new type, and similar to that used to construct strings. Low-level use[]byte
To construct content, it is exactly what we are doing now, and we can finally use it()
Method to get the final string value. But the cool thing about it is that it does this without having to do the copy just mentioned. It dares to do this because it is constructed under the[]byte
It has never been exposed, so it is still guaranteed that no one can unintentionally or maliciously modify the immutable string that has been generated.
So our next idea is not to build a random string in slice, but to use , after the building is finished we can get and return the result without copying. This may help in terms of speed, and certainly in terms of memory usage and allocation (Translator's note: you'll see clearly in benchmark).
package approach7 import ( "fmt" "math/rand" "strings" "testing" "time" ) const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" var src = (().UnixNano()) const ( // 6 bits to represent a letter index letterIdBits = 6 // All 1-bits as many as letterIdBits letterIdMask = 1<<letterIdBits - 1 letterIdMax = 63 / letterIdBits ) func randStr(n int) string { sb := {} (n) // A rand.Int63() generates 63 random bits, enough for letterIdMax letters! for i, cache, remain := n-1, src.Int63(), letterIdMax; i >= 0; { if remain == 0 { cache, remain = src.Int63(), letterIdMax } if idx := int(cache & letterIdMask); idx < len(letters) { (letters[idx]) i-- } cache >>= letterIdBits remain-- } return () } func TestApproach7(t *) { (randStr(10)) } func BenchmarkApproach7(b *) { for i := 0; i < ; i++ { _ = randStr(10) } }
After constructing the builder, we immediately called it()
Method, so that it allocates a large enough underlying slice to avoid allocation in subsequent operations
8. “Mimicing” with package unsafe
Imitation using unsafe package
Just like our sixth section, we use it[]byte
to build strings. Switching to a slice might be some too heavy, we just want to avoid copying it.
useunsafe
Package to avoid final copy
// String returns the accumulated string. func (b *Builder) String() string { return *(*string)((&)) }
We can also complete this process by ourselves. So the idea is that we passunsafe
Package returns a string to avoid copying
package approach8 import ( "fmt" "math/rand" "testing" "time" "unsafe" ) const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" var src = (().UnixNano()) const ( // 6 bits to represent a letter index letterIdBits = 6 // All 1-bits as many as letterIdBits letterIdMask = 1<<letterIdBits - 1 letterIdMax = 63 / letterIdBits ) func randStr(n int) string { b := make([]byte, n) // A rand.Int63() generates 63 random bits, enough for letterIdMax letters! for i, cache, remain := n-1, src.Int63(), letterIdMax; i >= 0; { if remain == 0 { cache, remain = src.Int63(), letterIdMax } if idx := int(cache & letterIdMask); idx < len(letters) { b[i] = letters[idx] i-- } cache >>= letterIdBits remain-- } return *(*string)((&b)) } func TestApproach8(t *) { (randStr(10)) } func BenchmarkApproach8(b *) { for i := 0; i < ; i++ { _ = randStr(10) } }
Benchmark
go test ./... -bench=. -benchmem
Data tested by the original author
(Translator's note: The third column represents how many nanoseconds it takes to operate at one time)
BenchmarkRunes-4 2000000 723 ns/op 96 B/op 2 allocs/op
BenchmarkBytes-4 3000000 550 ns/op 32 B/op 2 allocs/op
BenchmarkBytesRmndr-4 3000000 438 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMask-4 3000000 534 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMaskImpr-4 10000000 176 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMaskImprSrc-4 10000000 139 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMaskImprSrcSB-4 10000000 134 ns/op 16 B/op 1 allocs/op
BenchmarkBytesMaskImprSrcUnsafe-4 10000000 115 ns/op 16 B/op 1 allocs/op
Translator test data
BenchmarkApproach1-12 3849038 299.5 ns/op 64 B/op 2 allocs/op
BenchmarkApproach2-12 5545350 216.4 ns/op 32 B/op 2 allocs/op
BenchmarkApproach3-12 7003654 169.7 ns/op 32 B/op 2 allocs/op
BenchmarkApproach4-12 7164259 168.7 ns/op 32 B/op 2 allocs/op
BenchmarkApproach5-12 13205474 89.06 ns/op 32 B/op 2 allocs/op
BenchmarkApproach6-12 13665636 84.41 ns/op 32 B/op 2 allocs/op
BenchmarkApproach7-12 17213431 70.37 ns/op 16 B/op 1 allocs/op
BenchmarkApproach8-12 19756956 61.41 ns/op 16 B/op 1 allocs/op
The data that has been released now has changed some changes when it comes to the original author, but it does not prevent us from seeing the trends of various methods:
- Just switching rune to byte, you will get a significant performance improvement (greater than 20%)
- use
rand.Int63()
replace()
Also achieved a significant improvement (greater than 20 percent) - Using Masking does not improve performance. On the contrary, the performance has been reduced from where the original author has been
- But once
rand.Int63()
After all the returned characters, the performance has been improved by 3 times. - use
Alternative
, performance improvement by 21%
- use
, We increased the speed by 3.5%, and reduced the original two memory allocations to one time!
- use
unsafe
Package instead, performance improvement by 14%
Comparing the eighth plan with the first plan, the eighth plan is 6.3 times faster than the first plan, using only one-sixth of the memory, and the number of allocations is only half of the original one.
This is the end of this article about 8 super simple ways to generate random strings for Go. For more related contents for generating random strings, please search for my previous articles or continue browsing the following related articles. I hope everyone will support me in the future!