I'm latelyGo Forum Found inString size of 20 character The question, "hollowaykeanho" gives relevant answers, and I found out that the solution to intercept strings is not the most ideal method, so I did a series of experiments and obtained a method to efficiently intercept strings. This article will gradually explain my practical process.
Byte slice intercept
This is the first solution given by "hollowaykeanho". I think it is also the first solution that many people think of, using go's built-in slice syntax to intercept strings:
s := "abcdef" (s[1:4])
We quickly learned that this is intercept by byte. When dealing with ASCII single-byte string intercept, there is nothing more perfect than this. Chinese often takes up multiple bytes, and in utf8 encoding, it is 3 bytes. We will get garbled data in the following program:
s := "Go Language" (s[1:4])
Killer weapon - Type conversion []rune
The second solution given by "hollowaykeanho" is to convert the string to []rune, then intercept it according to the slice syntax, and then convert the result into a string.
s := "Go Language" rs := []rune(s) (strings(rs[1:4]))
First we got the right result, which is the biggest improvement. However, I have always been cautious about type conversion and I am worried about its performance issues, so I tried to find answers in search engines and major forums, but the most I got is this solution, which seems to be the only solution.
I've tried writing a performance test to evaluate its performance:
package benchmark import ( "testing" ) var benchmarkSubString = "Go is a statically strongly typed, compiled, concurrent, and garbage collection programming language developed by Google. For the convenience of searching and recognition, it is sometimes called Golang." var benchmarkSubStringLength = 20 func SubStrRunes(s string, length int) string { if (s) > length { rs := []rune(s) return string(rs[:length]) } return s } func BenchmarkSubStrRunes(b *) { for i := 0; i < ; i++ { SubStrRunes(benchmarkSubString, benchmarkSubStringLength) } }
I got results that surprised me a little:
goos: darwin goarch: amd64 pkg: /thinkeridea/go-extend/exunicode/exutf8/benchmark BenchmarkSubStrRunes-8 872253 1363 ns/op 336 B/op 2 allocs/op PASS ok /thinkeridea/go-extend/exunicode/exutf8/benchmark 2.120s
It takes about 1.3 microseconds to intercept the first 20 characters of 69 strings, which is greatly beyond my expectations. I found that because type conversion brings memory allocation, this produces a new string, and type conversion requires a lot of calculations.
Life-saving straw -
I want to improve the extra calculation and memory allocation brought by type conversion. I carefully sorted out the strings package and found that there were no related tools. At this time, I thought of the utf8 package, which provides tools related to multi-byte calculations. To be honest, I am not familiar with it, or I have not used it actively (directly). I looked at all its documents and found that functions can convert a single character and give the number of bytes the character occupies. I tried this experiment:
package benchmark import ( "testing" "unicode/utf8" ) var benchmarkSubString = "Go is a statically strongly typed, compiled, concurrent, and garbage collection programming language developed by Google. For the convenience of searching and recognition, it is sometimes called Golang." var benchmarkSubStringLength = 20 func SubStrDecodeRuneInString(s string, length int) string { var size, n int for i := 0; i < length && n < len(s); i++ { _, size = (s[n:]) n += size } return s[:n] } func BenchmarkSubStrDecodeRuneInString(b *) { for i := 0; i < ; i++ { SubStrDecodeRuneInString(benchmarkSubString, benchmarkSubStringLength) } }
After running it I got results that surprised me:
goos: darwin goarch: amd64 pkg: /thinkeridea/go-extend/exunicode/exutf8/benchmark BenchmarkSubStrDecodeRuneInString-8 10774401 105 ns/op 0 B/op 0 allocs/op PASS ok /thinkeridea/go-extend/exunicode/exutf8/benchmark 1.250s
It is 13 times more efficient than []rune type conversion, eliminating memory allocation, it is really exciting and exciting, I can't wait to reply "hollowaykeanho" to tell him that I found a better method and provided relevant performance tests.
I was a little excited and excitedly browsed various interesting questions in the forum. When I checked the help of one question (I forgot which one was -_-||), I was surprised to find another idea.
Good medicine is not necessarily bitter - range string iteration
Many people seem to forget that range is iterated by character, not bytes. When iterating a string using range, return the character starting index and corresponding characters. I immediately tried to use this feature to write the following use case:
package benchmark import ( "testing" ) var benchmarkSubString = "Go is a statically strongly typed, compiled, concurrent, and garbage collection programming language developed by Google. For the convenience of searching and recognition, it is sometimes called Golang." var benchmarkSubStringLength = 20 func SubStrRange(s string, length int) string { var n, i int for i = range s { if n == length { break } n++ } return s[:i] } func BenchmarkSubStrRange(b *) { for i := 0; i < ; i++ { SubStrRange(benchmarkSubString, benchmarkSubStringLength) } }
I tried running it, and it seemed to have endless magic, and it didn't disappoint me.
goos: darwin goarch: amd64 pkg: /thinkeridea/go-extend/exunicode/exutf8/benchmark BenchmarkSubStrRange-8 12354991 91.3 ns/op 0 B/op 0 allocs/op PASS ok /thinkeridea/go-extend/exunicode/exutf8/benchmark 1.233s
It only increased by 13%, but it was simple and easy to understand enough that it seemed like the good medicine I was looking for.
If you think this is over, no, this is just the beginning of exploration for me.
Ultimate Moment - Make your own wheels
After drinking range that bowl of sweet and greasy medicine, I seemed to calm down. I needed to make a wheel that needed to be easier to use and more efficient.
So I carefully observed two optimization schemes, which seemed to be to find the index position of intercepting characters of the specified length. If I could provide such a method, would it be possible to provide a user with a simple intercept implementation s[:strIndex(20)] . After the idea was inflicted, I couldn't get rid of it again. I thought hard about how to provide an easy-to-use interface for two days.
I created it later and Methods are used to calculate the index position where the specified number of characters ends in string and byte slices respectively.
I use Implement a string intercept test:
package benchmark import ( "testing" "unicode/utf8" "/thinkeridea/go-extend/exunicode/exutf8" ) var benchmarkSubString = "Go is a statically strongly typed, compiled, concurrent, and garbage collection programming language developed by Google. For the convenience of searching and recognition, it is sometimes called Golang." var benchmarkSubStringLength = 20 func SubStrRuneIndexInString(s string, length int) string { n, _ := (s, length) return s[:n] } func BenchmarkSubStrRuneIndexInString(b *) { for i := 0; i < ; i++ { SubStrRuneIndexInString(benchmarkSubString, benchmarkSubStringLength) } }
Try running it and I'm very pleased with the results:
goos: darwin goarch: amd64 pkg: /thinkeridea/go-extend/exunicode/exutf8/benchmark BenchmarkSubStrRuneIndexInString-8 13546849 82.4 ns/op 0 B/op 0 allocs/op PASS ok /thinkeridea/go-extend/exunicode/exutf8/benchmark 1.213s
The performance is 10% higher than range, which makes me very pleased that I can get a new improvement again, which proves that it works.
It is efficient enough, but not easy to use. I need two lines of code to intercept strings. If I want to intercept characters between 10 and 20, I need 4 lines of code. This is not an easy interface for users to use. I have referred to the sub_string method in other languages. I think I should also design an interface like this for users.
and This is the method I wrote after careful consideration:
func RuneSubString(s string, start, length int) string
It has three parameters:
- s: The input string
- start : The position to start intercepting. If start is a non-negative number, the returned string will start from the start position of string and start from 0. For example, in the string "abcdef", the character at position 0 is "a", the string at position 2 is "c", and so on. If start is a negative number, the returned string will start from the start character at the end of string. If the length of string is less than start, an empty string will be returned.
- length: The intercepted length. If a positive length is provided, the returned string will include up to length characters from start (depending on the length of string). If the length of a negative number is provided, the length characters at the end of string will be omitted (if start is negative, it will start from the end of the string). If start is not in this text, an empty string will be returned. If a length with a value of 0 is provided, the returned substring will start from the start position until the end of the string.
I provided them with alias. According to usage habits, everyone is more inclined to look for solutions to this kind of problem in the strings package. I created and As a more retrieval method.
Finally, I need to do another performance test to ensure its performance:
package benchmark import ( "testing" "/thinkeridea/go-extend/exunicode/exutf8" ) var benchmarkSubString = "Go is a statically strongly typed, compiled, concurrent, and garbage collection programming language developed by Google. For the convenience of searching and recognition, it is sometimes called Golang." var benchmarkSubStringLength = 20 func SubStrRuneSubString(s string, length int) string { return (s, 0, length) } func BenchmarkSubStrRuneSubString(b *) { for i := 0; i < ; i++ { SubStrRuneSubString(benchmarkSubString, benchmarkSubStringLength) } }
Running it will not disappoint me:
goos: darwin goarch: amd64 pkg: /thinkeridea/go-extend/exunicode/exutf8/benchmark BenchmarkSubStrRuneSubString-8 13309082 83.9 ns/op 0 B/op 0 allocs/op PASS ok /thinkeridea/go-extend/exunicode/exutf8/benchmark 1.215s
Although compared There is a drop, but it provides an easy to interact and use interface, which I think should be the most practical solution, and can still be used if you pursue the ultimate, it is still the fastest solution.
Summarize
When you see the code in question, even if it is very simple, it is still worth exploring and constantly exploring it. This is not boring and boring, but will gain a lot.
From the initial conversion of the []rune type to the end of building the wheels myself, I not only achieved a 16-fold performance improvement, but also learned the utf8 package, deepened the characteristics of range traversing strings, andgo-extend The warehouse contains multiple practical and efficient solutions to make morego-extend users get results.
go-extend It is a warehouse that includes practical and efficient methods. If readers have good functions and general and efficient solutions, I hope you will send me a Pull request without hesitation. You can also use this warehouse to speed up the function and improve performance.
The above is all the content of this article. I hope it will be helpful to everyone's study and I hope everyone will support me more.