01 Introduction
In Golang language, the value of type string is read-only and cannot be modified. If modification is required, the usual practice is to intercept and splice the original string to generate a new string, but it involves memory allocation and data copying, which has performance overhead. In this article, we introduce how to use strings efficiently in the Golang language.
02 The data structure of the string
In Golang language, the value of a string is stored in a continuous memory space. We can regard the memory space where the data is stored as an array of bytes. The data structure of the string in runtime is a structure stringStruct, which contains two fields, namely the pointer type str and the integer type len. The field str is the pointer value to the head of the byte array, and the value of the field len is the length of the string (number of bytes).
type stringStruct struct { str len int }
Let's compare the performance gap between string and string pointer through sample code. We define two functions, using string and *string as the parameters of the function.
var strs string = `Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.` func str (str string) { _ = str + "golang" } func ptr (str *string) { _ = *str + "golang" } func BenchmarkString (b *) { for i := 0; i < ; i++ { str(strs) } } func BenchmarkStringPtr (b *) { for i := 0; i < ; i++ { ptr(&strs) } }
output:
go test -bench . -benchmem string_test.go goos: darwin goarch: amd64 cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz BenchmarkString-16 21987604 46.05 ns/op 128 B/op 1 allocs/op BenchmarkStringPtr-16 24459241 46.23 ns/op 128 B/op 1 allocs/op PASS ok command-line-arguments 2.590s
Reading the above code, we can find that using strings as parameters and using string pointers as parameters have basically the same performance.
Although the value of a string is not specific data, it is a pointer to the memory address where the string data is stored and the length of a string, the string is still of the value type.
03 The string is read-only and cannot be modified
In Golang language, strings are read-only and cannot be modified.
func main () { str := "golang" (str) // golang byteSlice := []byte(str) byteSlice[0] = 'a' (string(byteSlice)) // alang (str) // golang }
Reading the above code, we convert the variable str of string type to byte slice type and assign the value to the variable byteSlice, and use the index subscript to modify the value of byteSlice, and the printing result has not changed.
Because strings are converted to byte slices, the Golang compiler reallocates memory for byte slice types to store data instead of sharing the same memory space as string type variables.
Readers may think of using pointers to modify the data stored in memory of variables of string type.
func main () { var str string = "golang" (str) ptr := (*uintptr)((&str)) var arr *[6]byte = (*[6]byte)((*ptr)) var len *int = (*int)((uintptr((&str)) + ((*uintptr)(nil)))) for i := 0; i < (*len); i++ { ("%p => %c\n", &((*arr)[i]), (*arr)[i]) ptr2 := &((*arr)[i]) val := (*ptr2) (*ptr2) = val + 1 } (str) }
output:
go run golang 0x10c96d2 => g unexpected fault address 0x10c96d2 fatal error: fault [signal SIGBUS: bus error code=0x2 addr=0x10c96d2 pc=0x10a4c56]
Reading the above code, we can find that in the code, we try to modify the data stored in memory of the str variable of type string through pointers, which caused a signal SIGBUS runtime error, thus proving that the variable of type string is read-only.
We already know that the structure of a string in runtime contains two fields, a pointer to the memory address where the data is stored and the length of the string. Because the string is read-only, after the string is assigned, its data and length will not be modified. Therefore, reading the length of the string is actually reading the value of the field len, and the complexity is O(1).
When comparing strings, because the string is read-only and cannot be modified, as long as the length of the two compared strings is different, you can judge that the two strings are different, and there is no need to compare the specific data stored in the two strings.
If the value of len is the same, then judge whether the pointers of the two strings point to the same memory. If the value of len is the same and the pointer points to the same memory, you can judge that the two strings are the same. But if the value of len is the same and the pointer does not point to the same piece of memory, then you still need to continue to compare whether the string data pointed to by the pointer of the two strings is the same.
04 String stitching
In the Golang language, there are many ways to splice strings, namely:
- Use operator +/+=
- use
- use
- use
- use
Among them, the use of operators is the easiest to use, but it is not the most efficient. Generally, the use scenario is for the length of strings that are known to be spliced.
Using splicing strings has the worst performance, but it can be formatted, so the general usage scenario requires formatting splicing strings.
The performance of using , and the highest performance string splicing method is to use .
I'm going to use more writing and ink to splice strings.
The Builder type in the Golang language standard library strings, used to effectively splice strings in Write methods, which reduces data copying and memory allocation.
type Builder struct { addr *Builder // of receiver, to detect copies by value buf []byte }
The Builder structure contains two fields, namely addr and buf. The field addr is the pointer type and the field buf is the byte slice type, but its value is still not allowed to be modified, but the values in the byte slice can be spliced or reset.
Builder provides a series of Write* stitching methods that can be used to splice new data to the end of existing data, and can automatically scale up if the capacity of byte slices is not enough. It should be noted that as long as capacity expansion is triggered, memory allocation and data copying will be involved. The automatic scaling rules are the same as the slice scaling rules.
In addition to automatic expansion, it can also be manually expanded. The Grow method provided by Builder can expand the number of bytes according to the int type parameter transmission. Because the expansion operation involves memory allocation and data copying, Golang also optimizes when calling the Grow method to manually expand capacity. If the current capacity of the byte slice is less than or equal to the value of the passed parameter, the Grow method will not perform the expansion operation. The manual expansion rule is to double the original byte slice capacity plus the value of the parameter.
The Builder type also provides a reset method, Reset, which resets variables of the Builder type to zero value. After reset, the original byte slice will be garbage collected.
After learning about the above introduction to Builder, I believe readers have a preliminary understanding of Builder. Let’s take a look at the difference between the number of pre-allocated bytes and the number of unallocated bytes through the code:
var lan []string = []string{ "golang", "php", "javascript", } func stringBuilder (lan []string) string { var str for _, val := range lan { (val) } return () } func stringBuilderGrow (lan []string) string { var str (16) for _, val := range lan { (val) } return () } func BenchmarkBuilder (b *) { for i := 0; i < ; i++ { stringBuilder(lan) } } func BenchmarkBuilderGrow (b *) { for i := 0; i < ; i++ { stringBuilderGrow(lan) } }
output:
go test -bench . -benchmem builder_test.go goos: darwin goarch: amd64 cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz BenchmarkBuilder-16 13761441 81.85 ns/op 56 B/op 3 allocs/op BenchmarkBuilderGrow-16 20487056 56.20 ns/op 48 B/op 2 allocs/op PASS ok command-line-arguments 2.888s
Reading the above code, you can find that calling the Grow method has a higher preallocated number of bytes than string splicing with unpreallocated number of bytes. We try to use the Grow method to preallocate the number of bytes in advance while we can estimate the number of bytes.
Note: First, variables of type Builder cannot be copied after being called, otherwise panic will be raised. Second, because the value of the Builder type is not completely unmodified, users need to pay attention to the issue of concurrency safety.
05 string and byte slice convert each other
Because except that the slice type can only be compared with nil, it is impossible to compare between slice types. If we need to compare slice types, the usual practice is to convert the slice type to string type first. However, because the string type is read-only and cannot be modified, the conversion operation involves memory allocation and data copying.
To improve the performance of the conversion, the only way is to reduce or avoid the overhead of memory allocation. In the Golang language, the runtime also optimizes the mutual conversion of the two. Interested readers can read the relevant source code in runtime:
/usr/local/go/src/runtime/
However, we can continue to optimize and implement zero-copy conversion operations, thereby avoiding the overhead of memory allocation and improving conversion efficiency.
First read the data structures of StringHeader and SliceHeader in reflect:
// /usr/local/go/src/reflect/ type StringHeader struct { Data uintptr // Point to the byte array that stores data Len int // length} type SliceHeader struct { Data uintptr // Point to the byte array that stores data Len int // length Cap int // capacity}
Reading the above code, we can find that the fields of StringHeader and SliceHeader only lack a field Cap representing capacity, both of which have pointers and lengths to the byte array that stores data. We only need to use to obtain the memory address to modify data in the original memory space, avoiding the overhead of memory allocation and data copying.
Because StringHeader lacks a field Cap representing capacity than SliceHeader, there is no problem by converting *SliceHeader to *StringHeader, but otherwise it won't work. We need to add a Cap field and use the value of field Len as the default value of field Cap.
func main () { str := "golang" ("str val:%s type:%T\n", str, str) strPtr := (*)((&str)) // strPtr[0] = 'a' = () str2 := *(*[]byte)((strPtr)) ("str2 val:%s type:%T\n", str2, str2) ((*)((&str2)).Data) }
output:
go run golang str val:golang type:string 17602449 str2 val:golang type:[]uint8 17602449
Reading the above code, we can find that by converting a string into a byte slice, we can achieve zero copy, str and str2 share the same piece of memory without a new piece of memory. However, it should be noted that the converted byte slices cannot be modified, because in the Golang language, strings are read-only, and modifying them through index subscripts will cause panic.
06 Summary
In this article, we introduce how to use strings in Golang language efficiently. First, we introduce the data structure of strings in runtime, and then introduce several ways of string splicing. The strings and byte slices are converted to each other byte slices. We also prove that strings are read-only in Golang language through sample code. For more information about string manipulation, readers can read the standard library strings and strconv to learn more.
This is the end of this article about how to use strings in Golang language efficiently. For more information about using strings in Golang language, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!