In Go,rune
andbyte
They are all types that represent individual characters, but they have some key differences.
byte type
byte
yesuint8
The alias of , that is, an 8-bit unsigned integer, represents a byte, with a range of 0 to 255.
-
byte
Used to represent the UTF-8 encodingbyte, suitable for handling byte streams and ASCII characters.
The number of bytes occupied by characters:
- ASCII characters (0-127) take up 1 byte.
- Common characters, such as Latin letters and punctuation marks, take up 1 byte.
- Non-ASCII characters such as Chinese will take up 3 bytes.
byte
Represents: string"you"
, its UTF-8 encoding in Go is0xE4, 0xBD, 0xA0
(hexadecimal).
s := "you" for i := 0; i < len(s); i++ { ("byte at index %d: %d\n", i, s[i]) }
Output:
byte at index 0: 228
byte at index 1: 189
byte at index 2: 160
rune type
rune
yesint32
Alias, that is, a 32-bit signed integer, used to represent a Unicode character. All characters in Go (including ASCII and Unicode characters) arerune
Denoted by type, the range is 0 to 0x10FFFF.
-
rune
Used to representUnicode characters, it represents the characterCode points, suitable for handling character operations, especially involving Unicode characters (such as Chinese, emojis, etc.).
rune
express:
s := "you" for _, c := range s { ("rune: %c, rune value: %d\n", c, c) }
Output:
rune: You, rune value: 20320
this means"you"
Unicode encoding points (20320
,Right now0x4F60
)quiltrune
Type storage.
UTF-8 and Unicode relationship
- Unicode is a character set, and UTF-8 isOne of the ways to encode a Unicode character set. Unicode defines the encoded points of all characters, but it does not specify how characters are stored and transferred. To achieve cross-platform and cross-language compatibility, UTF-8 is defined as a way to convert Unicode encoded points into byte sequences. In addition to UTF-8, there are UTF-16 and UTF-32.
-
connect:
- Unicode assigns an encoded point (one number) to each character.
- UTF-8 encodes these Unicode encoding points through sequences of bytes of different lengths, so that they can be stored in files, transmitted over the network, displayed on screen, etc.
The main differences between byte and rune
characteristic | byte |
rune |
---|---|---|
type |
uint8 (8-bit unsigned int) |
int32 (32-bit signed int) |
use | Processing ASCII or byte data | Handle Unicode characters |
Express range | 0 to 255 | 0 to 0x10FFFF |
Common Applications | Byte stream, ASCII characters | Unicode characters (including multibyte characters) |
Storage size | 1 byte | 4 bytes |
Character set support | Only ASCII characters are supported | Supports all Unicode characters |
Go's default encoding method
The default encoding method of Go strings isUTF-8. So use by defaultbyte
Sequence to represent each character in the string.
Specifically, strings in Go (string
Type) is fromUTF-8 encoded byte sequenceComposition. therefore:
- A Go string is composed of multiple bytes (
byte
) consists of each byte, which is a UTF-8 encoded character. - These bytes follow UTF-8 encoding, and the Go string can contain both ASCII characters (the characters occupy 1 byte in UTF-8) or multi-byte Unicode characters (such as Chinese characters, which usually occupy 3 bytes in UTF-8).
s := "a" ("Count of bytes occupied:", len(s)) ("; Type: %T ", s[0]) () s1 := "you" ("Count of bytes occupied:", len(s1)) ("; Type: %T ", s1[0])
Output:
Number of bytes occupied: 1; Type: uint8
Number of bytes occupied: 3; type: uint8
Traversal method
Traversal byte
bytes := []byte(s)
You can convert the string directly tobyte
, of course, you can also traverse:
- use
for i := 0; i < len(s); i++
, each byte in the string can be accessed in each iteration. -
len(s)
Returns the stringBytes, i.e. the total number of bytes contained in a string, not the number of characters. For a string containing multibyte characters (such as Chinese characters),len(s)
Returns the number of bytes occupied by the string.
package main import "fmt" func main() { s := "you" // Contains Chinese characters // traverse string by bytes ("Travel over string by byte:") for i := 0; i < len(s); i++ { ("s[%d] = %v (type: %T)\n", i, s[i], s[i]) // Output the value of each byte } }
Output:
Bytes to traverse the string:
s[0] = 228 (Type: uint8)
s[1] = 189 (Type: uint8)
s[2] = 160 (Type: uint8)
Traversal rune
runes := []rune(s)
You can convert the string directly torune
, of course, you can also traverse:
use
for _, c := range s
When traversing a string, Go will automatically transfer the string.s
Decode each character in it intorune
Type, so that even if the characters are multibytes, they can be processed correctly.range
When traversing the string, pressCharacter (rune)Iterate. Return one for each iterationUnicode code point (rune)and the index of this character in the string. For multibyte characters,range
These bytes will be automatically skipped and iterated by characters.
package main import "fmt" func main() { s := "you" // len(s) returns the number of bytes ("len(s) =", len(s)) // Output: 3, because "you" is represented by 3 bytes // Use range to traverse strings and traverse by character (rune) ("Use range to traverse strings, traverse by character (rune):") for i, r := range s { ("i = %d, r = %v (type: %T)\n", i, r, r) } }
Output:
len(s) = 3
Use range to traverse strings and traverse by character (rune):
i = 0, r = 20320 (Type: int32)
Replenish
for i := range s
The s[i] is actuallybyte
, but there will be problems when dealing with Chinese.
When you use
for i := range s
There may not be any problem when processing English strings, because English characters (ASCII characters) are represented by a single byte in UTF-8 encoding, so each character corresponds to exactly one byte.But if the string contains non-English characters (such as Chinese, emojis, etc.), they usually take up multiple bytes. in this case. use
for i := range s
You will find problems.range
Will follow characters (rune
) traversal, the number of characters counted is (rune
)【There is only 1 as follows】, not the number of bytes (byte
)【One Chinese should correspond to 3 bytes】.
package main import "fmt" func main() { s := "you" // The string contains Chinese characters // Use range to traverse strings ("Use range to traverse strings:") for i := range s { ("s[%d] = %v (type: %T)\n", i, s[i], s[i]) // Print the value of each byte } }
Output:
Use range to traverse strings:
s[0] = 228 (Type: uint8)
Character restoration
To get frombyte
Sequence orrune
The sequence is restored back to the original string, you can do it in the following ways:
-
from
byte
Sequence restore string: Can be used directlystring(byteSlice)
。 -
from
rune
Sequence restore string: Can be used directlystring(runeSlice)
。
Restore string from byte sequence
package main import "fmt" func main() { s := "Hello" // String "Hello" // Convert string to run slice bytes := []byte(s) ("bytes:", bytes) // Convert rune slice back to string s1 := string(bytes) ("Restored string:", s1) }
bytes: [228 189 160 229 165 189]
Restored string: Hello
Restore string from rune sequence
package main import "fmt" func main() { s := "Hello" // String "Hello" // Convert string to run slice runes := []rune(s) ("runes encoding:", runes) // Convert rune slice back to string s1 := string(runes) ("Restored string:", s1) }
Runes encoding: [20320 22909]
Restored string: Hello
This is the end of this article about the use and differences between rune and byte in Golang. For more related Golang rune and byte content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!