A brief analysis of the use of rune type in Golang

1. Overview

I often see the rune keyword in open source libraries. From the golang source code, it is an alias for int32 (-231~231-1). Compared with byte (-128~127), it can represent more characters.

2. Use

Since the range of the rune can represent is larger, it can handle all characters, including Chinese characters of course. When calculating Chinese characters in normal times, rune can be used.

The official explanation for query is as follows:

// rune is an alias for int32 and is equivalent to int32 in all ways. It is
// used, by convention, to distinguish character values from integer values.
 
//The alias of int32 is equivalent to int32 in almost all aspects//It is used to distinguish between character values and integer values 
type rune = int32

This may still be quite confused about the role and significance of rune. Let’s take a look at the role of rune through two simple examples.

Example 1:

package main
 
import "fmt"
 
func main() {
 
    var str = "Hello"
    ("len(str):", len(str))
 
}

Output:

//Output, note that one Chinese character in golang accounts for 3 bytes
len(str): 12

The underlying string layer in golang is implemented through a byte array. Chinese characters occupy 2 bytes under unicode and 3 bytes under utf-8 encoding, while golang's default encoding happens to be utf-8.

So? What if we expect to get the length of a string instead of the byte length of the underlying string? ? ?

package main
 
import (
    "fmt"
    "unicode/utf8"
)
 
func main() {
 
    var str = "Hello"
 
    //The underlying string layer in golang is implemented through a byte array. The direct calculation is actually calculated by the byte length. So a Chinese character takes up 3 bytes and calculates 3 lengths.    ("len(str):", len(str))
     
    //The following two can get the string length of str     
    //The unicode/utf8 package in golang provides a method to obtain length using utf-8    ("RuneCountInString:", (str))
 
    //Processing unicode characters through rune type    ("rune:", len([]rune(str)))
}

Output:

len(str): 12
RuneCountInString: 8
rune: 8

Example 2:

package main
 
import "fmt"
 
func main() {
    s := "Hello abc"
    r := "Hello 123"
    ("len(s)=", len([]byte(s)), "len(r)=", len([]rune(r))) //len(s)= 9 len(r)= 5
 
    for k, v := range r {
        ("k=", k, "v=", v)
    }
 
    for k, v := range []rune(r) {
        ("k2=", k, "v2=", v)
    }
}

When traversing the string containing Chinese with range, you will find that the value of k in the first for is 0, 1, 2, 3, 6; the value of k in the second for is 0, 1, 2, 3, 4; it means that if the string contains Chinese, it can be recognized when range. If one Chinese occupies 3 bytes, the index will automatically be added to three; while the rune index will only add one.

3. Summary

The underlying string layer in golang is implemented through a byte array. Chinese characters occupy 2 bytes under unicode and 3 bytes under utf-8 encoding. The default encoding of golang is utf-8. If you want to get the true length of a string (one Chinese counts as one bit), you need to convert the string to rune and then find the length.

The byte data types in golang are similar to rune, and they are both variable types used to represent character types. Their differences are:

byte is equivalent to int8 and is often used to process ascii characters
rune is equivalent to int32, and is often used to process unicode or utf-8 characters

This is the end of this article about a brief analysis of the use of rune types in Golang. For more related Golang rune types, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!