SoFunction
Updated on 2025-03-05

Summary of the method of looping through Chinese strings in Go language

Go language loops through strings containing Chinese characters

First use a normal for loop to traverse strings containing Chinese

str := "Hello, hello"
	for i := 0; i < len(str); i++ {
		("%c", str[i])
	}

Output result:

hello,ä½ å¥½

It can be seen that using a normal for loop to loop over Chinese characters in a string has garbled code.

Then we use for range in Go language

str := "Hello, hello"
	for _, v := range str {
		("%c", v)
	}

Output result:

Hello, hello

It can be seen that using for range to loop over strings containing Chinese characters will not cause garbled code. Chinese characters will be output normally.

Then why does garbled code appear in Chinese characters using for loops?

First, let's look at the definition of characters in Go language

Characters are divided into two types in Go:

  • One is the unint8 type, or the byte type, which represents a character in the ASCII code.
  • The other is the rune type, which represents a UTF-8 character. When you need to process Chinese, Japanese or other matching characters, you need to use the rune type. The rune type is actually an int32.

"%T" in use can output the actual type of the variable. Use this method to view the original types of byte and rune. The code is as follows:

    var a byte = 'a'
	("%d %T\n", a, a)
	var b rune = 'you'
	("%d %T\n", b, b)

Output result:

97 uint8
20320 int32

So what are the characters types that are produced by the ordinary for loop and for range?

test:

for loop

str := "Hello, hello"
	for i := 0; i < len(str); i++ {
		("%c,%T  ", str[i],str[i])
	}

Output result:

h,uint8
e,uint8
l,uint8
l,uint8
o,uint8
,uint8
ä,uint8
½,uint8
,uint8
å,uint8
¥,uint8
½,uint8

for range

str := "Hello, hello"
	for _, v := range str {
		("%c,%T \n", v, v)
	}

Output result:

h,int32
e,int32
l,int32
l,int32
o,int32
,int32
You, int32
OK, int32

It can be seen that when using a normal loop, the type obtained is uint8, that is, the byte type.

When using a for range loop, the type obtained is int32, that is, the rune type.

Because when using a normal for loop, the type obtained is uint8, which corresponds to the ASCII encoding table. If the Chinese characters do not have the corresponding ID in the ASCII encoding table, then garbled code will appear when outputting Chinese.
When using for range, the type obtained is int32, which corresponds to the Unicode encoding table. Chinese characters have Chinese corresponding IDs in the Unicode encoding table, and the output is normal.

Summarize:

  • When using a normal loop, the type obtained is uint8, corresponding to the ASCII character set.
  • When using for range, the type is int32, corresponding to the Unicode character set.
  • In Go language, byte and rune are essentially uint8 and int32 types.

This is the article about the method of traversing Chinese strings in Go language loop traversing methods. For more related contents of traversing Chinese strings, please search for my previous articles or continue browsing the following related articles. I hope everyone will support me in the future!