Go language type conversion and problem discussion

Today we will talk about an operation that everyone does every day but rarely thinks deeply about - type conversion.

A strange line of code

Things started when I made some changes to the standard library sync at the beginning of the year.

Changes will use the newly added standard library in 1.19, out of caution, I read its code generally through before making the change, but a line of code caught my attention:

// A Pointer is an atomic pointer of type *T. The zero value is a nil *T.
type Pointer[T any] struct {
// Mention *T in a field to disallow conversion between Pointer types.
// See /issue/56603 for more details.
// Use *T, not T, to avoid spurious recursive type definition errors.
_ [0]*T


_ noCopy
v 
}

It's not noCopy, I've explained this in detail in golang: Implementing an uncopyable type.

What caught my attention was_ [0]*T, it is an anonymous field, and an array of zero length will not occupy memory. This doesn't affect the code I'm going to modify, but what it does arouses my curiosity.

Fortunately, this field's own comment gives the answer: this field is to prevent incorrect type conversion. What type of conversion requires this field to block it? With questions, I clicked the given issue link and saw the following example:

package main


import (
"math"
"sync/atomic"
)


type small struct {
small [64]byte
}


type big struct {
big [math.MaxUint16 * 10]byte
}


func main() {
a := [small]{}
(&small{})


b := [big](a) // type conversion
big := ()


for i := range  {
[i] = 1
}
}

An example program will cause memory errors, and it will have a high probability of causing segfaults on Linux environments. Why? Because the index value of big greatly exceeds the range of small, and we actually only store a small object in Pointer, we have index cross-border in the last loop, and go did not detect this cross-border.

Of course, Go has no obligation to detect such cross-border, because after using unsafe (the right packaging) type safety and memory safety can only be responsible for the user.

The fundamental problem here is that[small]and[big]There is no correlation between them, they should be completely different types. They should not be converted (if you have any doubts about this, you can search for information related to type constructors. Usually, there should be no correlation between types generated by type constructors in this generic type constructor). Especially, go is a strongly typed language. Similar things will be reported when running in C++ in python.

But the fact is that this conversion is legal before the beginning of the field is added and is easy to occur in generic types.

You may still be a little confused when you reach this point, but it doesn’t matter. After watching the next section, you will clear the clouds and dissipate.

Go type conversion

There is no implicit type conversion in golang, so if you want to convert a value of one type to another, you can only use such an expression.Type(value). The expression will copy the value and convert it to Type.

The rules for untyped constants need to be slightly more flexible. They can be automatically converted into corresponding types in the context. For details, see another article in my article about typed constants in golang.

Putting aside constants and cgo, golang's type conversion can be divided into several categories. Let's first look at some of the more common types.

Convert numeric types to each other

This is a pretty common conversion.

There is nothing to say about this, everyone should write similar codes every day:

c := int(a+b)
d := float64(c)

Numeric types can be converted to each other, and integers and floating points will also be converted according to corresponding rules. Values will be wound/truncated when necessary.

This conversion is relatively safe, the only thing to pay attention to is overflow.

unsafe related conversions

and all pointer types can be converted to each other, but fromConverting back does not guarantee type safety.

anduintptrThey can also be converted to each other, which is mainly used by some system-level APIs.

These transformations often appear in go runtime and some code that heavily relies on system programming. These conversions are dangerous and are recommended not to use unless necessary.

Conversion of string to byte and rune slices

The frequency of this conversion should be second only to the numerical conversion:

([]byte("hello"))
(string([]byte{104, 101, 108, 108, 111}))

This conversion has been optimized a lot, so sometimes the behavior is a bit different from ordinary type conversion, for example, data copying will be optimized.

I won't give an example to rune, there is not much difference in the code.

Convert slice into array

After go1.20, slices are allowed to be converted into arrays, and elements of slices within the copy range will be copied:

s := []int{1,2,3,4,5}
a := [3]int(s)
a[2] = 100
(s) // [1 2 3 4 5]
(a) // [1 2 100]

If the length of the array exceeds the length of the slice (note that it is not a cap), then panic will be performed. It is also OK to convert pointers to arrays, and the rules are exactly the same.

Conversion when the underlying type is the same

Although the above discussed types are very common, they can actually be regarded as special cases. Because these transformations are limited to specific types and the compiler will recognize these transformations and generate different codes.

But go actually allows a broader class of conversions that do not require so many special treatments: types with the same underlying type can be converted to each other.

For example:

type A struct {
a int
b *string
c bool
}


type B struct {
a int
b *string
c bool
}


type B1 struct {
a1 int
b *string
c bool
}


type A1 B


type C int
type D int

A and B are completely different types, but their underlying types arestruct{a int;b *string;c bool;}. C and D are also completely different types, but their underlying types are both int. A1 is derived from B. A1 and B have the same underlying type, and all A1 and A also have the same underlying type. B1 Because there is a field whose name is different from others, no one has the same underlying type as it.

To put it bluntly, the underlying type is a variety of built-in types (int, string, slice, map,...) andstruct{...}(Field name and whether export will be taken into account). Built-in types andstruct{...}The underlying type is yourself.

As long as the underlying type is the same, the types can be converted to each other:

func main() {
text := "hello"
a := A{1, &text, false}
a1 := A1(a)
("%#v\n", a1) // main.A1{a:1, b:(*string)(0xc000014070), c:false}
}

A1 and B can be considered a bit related, but it really has nothing to do with A. Our program can be compiled and runs very well. This is caused by the rules that can be converted between types with the same underlying types.

In addition, struct tags will be ignored during conversion, so as long as the field name and type are the same, no matter whether the tag is the same or not.

This rule allows two-way conversions for some unrelated types. At first glance, it seems that this rule is messing around, but this thing is not completely useless:

type IP []byte

Considering such a type, IP can be represented as a sequence of bytes, which is clearly stated in the RFC documentation, so it is reasonable for us to define this (in fact, everyone does this too). Because it is a sequence of bytes, we will naturally use some methods/functions that process byte slices on IP to implement code reuse and simplify development.

The problem is that these codes assume that their parameters/return values are[]byteInstead of IP, we know that IP is actually[]byte, but go does not allow implicit type conversion, so it is not possible to directly remove these functions by taking the IP value. Consider if there is no rule that can be converted between types with the same underlying type, how can we reuse these functions? We must be able to follow some unsafe paths. Rather than allowing it[]byte(ip)andIP(bytes)conversion.

Why not restrict only likeIPand[]byteWhat about such a conversion between them? Because this will make type checking complicated and drag down compilation speed. What Go values most is the simple compiler code and fast compilation speed. Naturally, he is unwilling to check these things more. It is better to just let go of the standards and convert the same types of the underlying types into simple and fast.

But this rule is very dangerous, it is what it said earlierThe problem.

Let's take a look at the first editionCode:

type Pointer[T any] struct {
_ noCopy
v 
}

The type parameter is justStoreandLoadUsed fortype conversion between normal pointers. This can lead to a fatal flaw:There will be the same underlying typestruct{_ noCopy;v ;}。

So no matter what[A]，[B]still[small]and[big], they all have the same underlying type, and they can be converted arbitrarily.

Now it's completely messed up. Although the user has to be responsible for unsafe, this obvious error that should not have been compiled can now appear in the code without the user's defense - ordinary developers will not spend time caring about how the standard library is implemented, so they don't knowWhat does it have to do with unsafe?

The developer of go finally added_ [0]*T, so that for each instantiated, As long as T is different, their underlying types will be different, and the above wrong type conversion will not happen. And choose*TIt can also prevent self-references from[[...]]Such code compiles errors.

Now you should also understand why I said generic types are the easiest to encounter this problem: as long as your generic type is a structure or other compound type, but no generic type parameters are used in the field or compound type, then all types instantiated from this generic type may have the same underlying type, allowing the completely wrong type conversion described in the issue to occur.

What's going on in other languages

For structured type languages, if the underlying types are the same, they can convert each other into basic operations, and different languages will appropriately relax/restrict this conversion. To put it bluntly, you only recognize the structure and not the others. No matter how you try to do things with the same structure, they are considered the same category. Therefore, the problem described by issue belongs to the level of not even wrong in these languages, and the design needs to be changed to avoid similar problems.

For languages that use the nominal type system, the same name is considered to be different in the same category, even if the structure is the same. By the way, c++, golang, and rust all belong to this type. Although the underlying types of golang act like structured types in type conversion and type constraints, their overall behavior is still biased towards nominal types. The official has not clearly defined what type system it is, so it is my opinion.

Completely structured typed languages are not very common, so let's take the common nominal typed language C++ and python using duck type as an example.

In python, we can customize the constructor of type, so we can implement the logic of type conversion in the constructor. If we do not have a custom constructor or other class method that can return a new type, the conversion between the two types is not possible by default. So there will be no problem like Go in Python.

C++ is similar to python. If the user does not customize it, there will be no conversion path by default. The difference between it and python is that in addition to constructors, there are also conversion operators and supportImplicit conversion under rule restrictions. Users need to define their own conversion constructor/conversion operators and implement conversion between two different types under the limitations of syntax rules. Whether this conversion is one-way or two-way is controlled by the user as well as Python. So there is no problem with Go in C++.

There are also rust, Java,... I won't list them one by one.

In short, this is also a side of the simplest way of go - creating some problems that are difficult to occur in other languages and then using concise means to fix them.

Summarize

We reviewed the type conversion in Go and also stepped on a related pitfall.

Here are some suggestions:

Want to use generics but don't want to get stuck: try to use generic type parameters in structure fields or composite types, and use_ [0]*TSuch fields not only make the code difficult to understand, but also make the initialization of the type troublesome.I don't recommend this when it is absolutely impossible.
Don't use generics but are afraid that other types have the same underlying types as your own: don't be afraid, just use less type conversion syntax on custom types. If you really need to convert between related custom types, define sometoTypeASuch methods, such as this conversion process means that what you control is no longer the default.
Convert between built-in types and custom types based on these types: There is nothing to worry about, because you are me and I am your relationship. If you really feel uncomfortable, you don't need totype T []int, change the type definition totype T struct { data []int }, in addition to the verbose code, there are many functions and range loops that accept slice parameters that cannot be used directly.

It is quite interesting to have a language like Go that hides murderous intent in simple grammar rules. If you only think about quick success, you may step on a landmine at any time.

This is the end of this article about Go language type conversion and problem discussion. For more related Go language type conversion content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!