1. Summary:
In Golang, there are two methods for obtaining random numbers, one is pseudo-random based on math/rand, and the other is true-random based on crypto/rand. Among them, due to its pseudo-random principle, math/rand often has repeated random numbers, resulting in more repetitive problems when random business is needed. In the actual project that the author has experienced, he encountered the repetitive problem when obtaining random numbers under high frequency and high concurrency. On the road to solving this problem, he tried several methods and finally found a better solution.
2. Research on math/rand random numbers:
The common method to obtain random numbers based on math/rand is:
import ( "math/rand" "time" ) r := ((().UnixNano())) result := r.Int31n(100)
The above code principle is to generate a seed through nanosecond time, and then obtain a random number through the seed. Commonly used developers should know that math/rand seeds have their fixedness. When the same number is used as a seed, the instances obtained through New are actually the same. Even if their instance addresses are different, the values obtained through the same random number method will be the same.
This situation is because math/rand will conduct a more complex algorithm process based on the random seeds, thereby obtaining a random number pool of length 607. Since the algorithm process is fixed and there is no random situation, the random number pool obtained by the same seed is completely consistent.
At this time, someone may think that taking a different nanosecond time to generate an instance can solve the problem? But such an idea is problematic in actual project practice. There are two main reasons:
1. One of the major advantages of Golang is its fast speed. When using().UnixNano() multiple times at a time to obtain nanosecond time, you will find that the nanosecond time is not different, but there will be duplication, which makes the random seeds the same at high frequency random.
2. One of the major features of Golang is high concurrency, but in actual operation, it is impossible to ensure that two concurrent coroutines will not obtain time at the same time, and duplication problems will also occur.
There are also corresponding solutions to these two problems:
1. For the first reason, you can use () to ensure that the program has passed one time before getting the nanosecond time, so that the time obtained will not be repeated.
2. For the second reason, the lock can be used to lock the process of obtaining time to ensure that the concurrency is not executed at the same time.
However, the practice in actual projects is not ideal.
In the first solution, since the time slices processed by the CPU, the skip time is not 1 nanosecond, and there will be slight differences between different CPUs. The author tested it on the i7-13700K CPU, and the time difference before and after skipping is about 0.1ms.
Under the second solution, if the first solution is not combined, it is impossible to guarantee whether the time before and after locking is completely different, so the sleep process needs to be retained. However, it can be calculated that if the concurrency amount reaches 10,000, the maximum waiting time for obtaining seeds may reach 1s. In addition, there are generally other business logic, which is difficult to ensure efficient demand.
There are implicit problems in the above solutions. So, another solution is to change the solution. Since the random number pool is a fixed length and cannot use sleep or lock, why not let the program only retain one random instance and update the random instance when the number of acquisitions reaches a certain value? We use the following simple example to test the feasibility:
import ( "math/rand" "time" "fmt" ) t1 := ().UnixNano() (t1) r := ((t1)) for i := 0; i < 600; i++ { r.Int31n(100) } t2 := ().UnixNano() (t2)
In the above code, we generate a random instance through t1 time, and let the instance randomly 600 times, and get the time t2 again. The execution results will find a bigger problem, the t1 time and t2 time are exactly the same! If t2 is used as the seed at this time, the next random 600 times will be exactly the same as the time of t1. If we let the delay or lock between different instances, the problem will return to the same problem in the previous solution.
Therefore, the following conclusions can be drawn. Commonly used math/rand cannot really solve the duplication problem when it is based on random access to the current time as seed.
3. Research on crypto/rand random numbers:
The common method to obtain random numbers based on crypto/rand is:
import ( "crypto/rand" "math/big" ) r, _ := (, (100)) result := r.Int64()
The random number obtained by crypto/rand can be regarded as true random because it obtains the source of the seed. Simply put, in all types of systems, there is a specific file specifically used to store true random values. The source of these values is various information, thermal noise, etc. generated by the hardware on the circuit. Since they are uncontrollable and unpredictable, they can be regarded as generating true random numbers.
In the above code, a file that stores true random numbers from the system will be obtained by calling the system-level API. For the system, when the random number is retrieved, it will be destroyed and will not be used twice.
However, crypto/rand has a disadvantage, its acquisition efficiency is very low, because it requires access to system files or calling system APIs, which will cause considerable time-consuming. Moreover, the random value obtained from the system will be one-time and cannot be used for the second time. Inefficient acquisition methods obviously cannot fully meet the needs of maintaining high-frequency and high concurrency procedures.
Fourth, the final solution:
Based on the previous related content, we can find a way to learn from each other's strengths and weaknesses, and get a solution: use crypto/rand to get a random number pool, use this random number as the seed of math/rand to obtain a random number pool for the value. When the value reaches a certain number of times, obtain a new seed to generate a new instance.
The method of obtaining a seed number is as follows:
import ( "crypto/rand" "encoding/binary" ) var seed int64 (, , &seed)
Using the above method, even if the seed number is obtained at high frequency or concurrently, it can ensure that the seeds obtained each time are different values, so that the repetition problems caused by the same seeds can be avoided.
The final solution is determined, and the combined code is as follows:
import ( crand "crypto/rand" "encoding/binary" mrand "math/rand" ) var seed int64 (, , &seed) source := (seed) r := (source) result := r.Int31n(100)
In the above code, an important issue still needs to be paid attention to, which is that it takes a long time and further improvements are needed when encapsulating the random process:
1. Add a statistical threshold, +1 after each random number, and update the random instance when the number of times reaches a certain value;
2. Lock the process of updating random instances to avoid repeated updates during concurrency;
After improvement, you can test its speed by yourself. In the project practice that the author has participated in, the average time it takes about 25 to 30ns to obtain a random number at a time with a statistical threshold of 300 and a high number of acquisitions. It not only meets the needs that are close to true randomness, but also meets the needs of high efficiency.
The above is the detailed content to solve the problem of random number repetition under high frequency and high concurrency in Go language. For more information about Go high frequency and high concurrency random number repetition, please pay attention to my other related articles!