Detailed explanation of how to speed up reflection in Go

I recently read an article about Go reflection. The author used reflection to fill the structure with field values, making full use of various internal mechanisms of Go and gradually explored the postures that make the code run faster.

Article (Original address:/post/aintnecessarilyslow/) It is very valuable to learn, so I translated it and sorted it out.

Don't use reflection unless you really need it. But when you don't use reflection, don't think it's because the reflection is slow, it can be fast too.

Reflection allows you to get information about Go types at runtime. If you've ever foolishly tried writing new versions like , this article will explore how to use reflection to fill structure values.

Point-of-point case

We use a simple case as the entry point to define a structure SimpleStruct, which includes two int type fields A and B.

type SimpleStruct struct {
    A int
    B int
}

If we receive the JSON data {"B": 42}, we want to parse it and set field B to 42.

Below, we will write some functions to achieve this, and they will all set B to 42.

If our code only works for SimpleStruct, it's totally worthless.

func populateStruct(in *SimpleStruct) {
     = 42
}

Reflection Basic Edition

However, if we are going to make a JSON parser, this means we cannot know the structure type in advance. Our parser code needs to receive any type of data.

In Go, this usually means that the interface{} (empty interface) parameter needs to be taken. We can then use the reflect package to check the value passed through the empty interface parameter, check whether it is a pointer to the structure, find field B and fill it with our value.

The code will look like this.

func populateStructReflect(in interface{}) error {
 val := (in)
 if ().Kind() !=  {
  return ("you must pass in a pointer")
 }
 elmv := ()
 if ().Kind() !=  {
  return ("you must pass in a pointer to a struct")
 }

 fval := ("B")
 (42)

 return nil
}

Let's go through the benchmark and see how fast it is.

func BenchmarkPopulateReflect(b *) {
 ()
 var m SimpleStruct
 for i := 0; i < ; i++ {
  if err := populateStructReflect(&m); err != nil {
   (err)
  }
  if  != 42 {
   ("unexpected value %d for B", )
  }
 }
}

The results are as follows.

BenchmarkPopulateReflect-16 15941916 68.3 ns/op 8 B/op 1 allocs/op

Is this good or bad? Well, memory allocation is never a good thing. You may be wondering why you need to allocate memory on the heap to set the structure field to 42 (see this issue:/golang/go/issues/2320). But overall, 68ns is not long. You can accommodate a lot of 68ns in the time you make any type of request over the network.

Optimization 1: Add a cache strategy

Can we do better? Well, usually the program we run doesn't do just one thing and then stop. They usually do very similar things over and over again. So, can we set up something to make repeating things faster?

If you look closely at the reflection checks we are performing, we will see that they all depend on the type of incoming value. If we cache the type results, then for each type we will only do a check once.

Let's consider the memory allocation issue again. Before we called the method, it was actually called, and it was called, and finally called to cause memory allocation. Can we call FieldByName on the type and cache something to get the value of the B field? In fact, if we cache , we can use it to get field values without allocation.

The new code version is as follows

var cache = make(map[][]int)

func populateStructReflectCache(in interface{}) error {
 typ := (in)

 index, ok := cache[typ]
 if !ok {
  if () !=  {
   return ("you must pass in a pointer")
  }
  if ().Kind() !=  {
   return ("you must pass in a pointer to a struct")
  }
  f, ok := ().FieldByName("B")
  if !ok {
   return ("struct does not have field B")
  }
  index = 
  cache[typ] = index
 }

 val := (in)
 elmv := ()

 fval := (index)
 (42)

 return nil
}

Because there is no memory allocation, new benchmarks get faster.

BenchmarkPopulateReflectCache-16 35881779 30.9 ns/op 0 B/op 0 allocs/op

Optimization 2: Utilizing field offsets

Can we do better? Well, if we know the offset of the structure field B and we know it is of type int, we can write it directly to memory. We can recover pointers to structures from the interface, because an empty interface is actually syntactic sugar for structures with two pointers: the first pointing to information about the type, and the second pointing to the value.

type eface struct {
 _type *_type
 data  
}

We can use the field offset in the structure to directly address field B of that value.

The new code is as follows.

var unsafeCache = make(map[]uintptr)

type intface struct {
 typ   
 value 
}

func populateStructUnsafe(in interface{}) error {
 typ := (in)

 offset, ok := unsafeCache[typ]
 if !ok {
  if () !=  {
   return ("you must pass in a pointer")
  }
  if ().Kind() !=  {
   return ("you must pass in a pointer to a struct")
  }
  f, ok := ().FieldByName("B")
  if !ok {
   return ("struct does not have field B")
  }
  if () !=  {
   return ("field B should be an int")
  }
  offset = 
  unsafeCache[typ] = offset
 }

 structPtr := (*intface)((&in)).value
 *(*int)((uintptr(structPtr) + offset)) = 42

 return nil
}

New benchmarks show that this will be faster.

BenchmarkPopulateUnsafe-16 62726018 19.5 ns/op 0 B/op 0 allocs/op

Optimization Three: Change the cache key type

Can it still make it go faster? If we sample the CPU, we will see that most of the time is spent accessing maps, which will also show map access invoked and . These are functions for hash interfaces and check if they are equal. Maybe using a simpler key will speed up? We can use the address of the type information from the interface, not itself.

var unsafeCache2 = make(map[uintptr]uintptr)

func populateStructUnsafe2(in interface{}) error {
 inf := (*intface)((&in))

 offset, ok := unsafeCache2[uintptr()]
 if !ok {
  typ := (in)
  if () !=  {
   return ("you must pass in a pointer")
  }
  if ().Kind() !=  {
   return ("you must pass in a pointer to a struct")
  }
  f, ok := ().FieldByName("B")
  if !ok {
   return ("struct does not have field B")
  }
  if () !=  {
   return ("field B should be an int")
  }
  offset = 
  unsafeCache2[uintptr()] = offset
 }

 *(*int)((uintptr() + offset)) = 42

 return nil
}

This is the result of the new version of the benchmark, and it's much faster again.

BenchmarkPopulateUnsafe2-16 230836136 5.16 ns/op 0 B/op 0 allocs/op

Optimization 4: Introducing descriptors

Can it be faster? Usually if we want to unmarshaling the data into a struct, it is always the same struct. Therefore, we can divide the function into two, where one function is used to check whether the structure meets the requirements and returns a descriptor, and the other function can use the descriptor in subsequent padding calls.

Here is our new version of code. The caller should call the describeType function at initialization to obtain a typeDescriptor, which will be used when calling the populateStructUnsafe3 function. In this very simple example, typeDescriptor is just an offset to the B field in the structure.

type typeDescriptor uintptr

func describeType(in interface{}) (typeDescriptor, error) {
 typ := (in)
 if () !=  {
  return 0, ("you must pass in a pointer")
 }
 if ().Kind() !=  {
  return 0, ("you must pass in a pointer to a struct")
 }
 f, ok := ().FieldByName("B")
 if !ok {
  return 0, ("struct does not have field B")
 }
 if () !=  {
  return 0, ("field B should be an int")
 }
 return typeDescriptor(), nil
}

func populateStructUnsafe3(in interface{}, ti typeDescriptor) error {
 structPtr := (*intface)((&in)).value
 *(*int)((uintptr(structPtr) + uintptr(ti))) = 42
 return nil
}

Here is a new benchmark test for how to use describeType calls.

func BenchmarkPopulateUnsafe3(b *) {
 ()
 var m SimpleStruct

 descriptor, err := describeType((*SimpleStruct)(nil))
 if err != nil {
  (err)
 }

 for i := 0; i < ; i++ {
  if err := populateStructUnsafe3(&m, descriptor); err != nil {
   (err)
  }
  if  != 42 {
   ("unexpected value %d for B", )
  }
 }
}

Now the benchmark results are getting quite fast.

BenchmarkPopulateUnsafe3-16 1000000000 0.359 ns/op 0 B/op 0 allocs/op

How great is this? If we write benchmarks with the original populateStruct function starting with the article, we can see how fast it is to fill this structure without using reflection.

BenchmarkPopulate-16 1000000000 0.234 ns/op 0 B/op 0 allocs/op

Unsurprisingly, this is a little faster than our best reflection-based version, but it's not much faster either.

Summarize

The reflection is not necessarily slow, but you have to put in a considerable effort to sprinkle unsafe flavors in your code by applying Go internal mechanism knowledge to really speed it up.

This is the end of this article about how to speed up reflection in Go. For more related Go language reflection content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!