SoFunction
Updated on 2025-03-01

This article will help you understand inline optimization in Go language

Inline optimization is a common compiler optimization strategy. In layman's terms, it is to expand the function where it is called, which can reduce the overhead caused by function calls (stack creation, parameter copying, etc.).

What is the specific manifestation when a function/method is inlined?

Observe inline

For example, now there is the following code

// ValidateName Verify that the given username is legal//
//go:noinline
func ValidateName(name string) bool { // AX: String pointer BX: String length    if len(name) < 1 {
        return false
    } else if len(name) > 12 {
        return false
    }
    return true
}
//go:noinline
func (s *Server) CreateUser(name string, password string) error {
    if !ValidateName(name) {
        return ("invalid name")
    }
    // ...
    return nil
}
type Server struct{}

For ease of understanding, I added functions and methods//go:noinlineComments. The Go compiler does not inline the function/method when encountering this comment. Let's first look at the assembly instructions generated by this code when inline is prohibited:

// ...
// ValidateName function// at this time:// AX register: pointer to name string array// BX register: length of name stringTEXT /bootun/example/(SB) /bootun/example/user/
    :9   0x4602c0  MOVQ AX, 0x8(SP) // Save the pointer of the name string to the stack (not used later)    :10  0x4602c5  TESTQ BX, BX     // BX & BX, used to detect whether BX is 0, equivalent to: CMPQ 0, BX    :10  0x4602c8  JE 0x4602d9      // If 0, jump to 0x4602d9    :12  0x4602ca  CMPQ $0xc, BX    // Compare the lengths of constant 12 and name    :12  0x4602ce  JLE 0x4602d3     // If less than or equal to 12, jump to 0x4602d3    :13  0x4602d0  XORL AX, AX      // return false
    :13  0x4602d2  RET
    :15  0x4602d3  MOVL $0x1, AX    // return true
    :15  0x4602d8  RET
    :11  0x4602d9  XORL AX, AX      // return false
    :11  0x4602db  RET
// CreateUser methodTEXT /bootun/example/user.(*Server).CreateUser(SB) //bootun/example/user/
    // Some preparations before function calls are omitted (register assignment and other operations)    :20    0x460300  CALL (SB)
    :20    0x460305  TESTL AL, AL
    :20    0x460307  JE 0x460317
    :24    0x460309  XORL AX, AX
    :24    0x46030b  XORL BX, BX
    :24    0x46030d  MOVQ 0x10(SP), BP
    :24    0x460312  ADDQ $0x18, SP
    :24    0x460316  RET
    :62  0x460317  LEAQ 0x9302(IP), AX
    :62  0x46031e  NOPW
    :62  0x460320  CALL (SB)
    // ...

Only the most critical paragraphs are intercepted in the above compilation:ValidateNameFunctions andCreateUsermethod.

It doesn't matter if you can't understand the compilation, please pay attention to it.CreateUserThere is a line in the method:20 CALL , ExplainCreateUserCalled within the methodValidateNameThe function is exactly the same as our code.

Now let's remove the source codeValidateNameon the function//go:noinlineAfter compiling again, check the generated assembly instructions:

If you want to try it with the code in the article, please do not delete itCreateUserMethod//go:noinline, because in the exampleCreateUserToo short, the compiler will also optimize it inline, which is not convenient for us to experiment and observe

// CreateUser function// at this time:// AX register: method Recever, that is, Server structure// BX register: pointer to name string// CX register: length of name stringTEXT /bootun/example/user.(*Server).CreateUser(SB) //bootun/example/user/
    // ...              
    :18    0x4602d4  MOVQ BX, 0x28(SP)    // Save the pointer of the name string to the stack    :19    0x4602d9  TESTQ CX, CX         // Verify whether the length of the name is 0    :9     0x4602dc  JE 0x4602e6          // If it is 0, it will jump to 0x4602e6    :9     0x4602de  NOPW
    :11    0x4602e0  CMPQ $0xc, CX        // Compare the constant 12 and the length of the string    :11    0x4602e4  JLE 0x460318         // If it is less than or equal to or more, it will jump to 0x460318 and continue execution (name is legal)    :62  0x4602e6  LEAQ 0x9333(IP), AX  // Construction error returns    :62  0x4602ed  CALL (SB)                  
    :62  0x4602f2  MOVQ $0xc, 0x8(AX)                    
    // ...
    :23    0x460318  XORL AX, AX      // AX = 0
    :23    0x46031a  XORL BX, BX      // BX = 0				
    :23    0x46031c  MOVQ 0x10(SP), BP  // Restore BP registers    :23    0x460321  ADDQ $0x18, SP     // Add stack pointer to reduce stack space    :23    0x460325  RET                // return
    // ...

Observe the code this time to find thatValidateNameThe logic of the function is directly embedded inCreateUserExpanded in the method. We can't search in the generated assembly code eitherValidateNameRelated symbols are here. The current code is equivalent to:

func (s *Server) CreateUser(name string, password string) error {
    if len(name) < 1 {
        return ("invalid name")
    } else if len(name) > 12 {
        return ("invalid name")
    }
    return nil
}

What kind of function will be inlined?

Inline related code incmd/compile/internal/inline/, is part of the compiler. There is a comment at the top of the file, which summarizes the controls and rules of inline well:

// The  flag controls the aggressiveness. Note that main() swaps level 0 and 1,
// making 1 the default and -l disable. Additional levels (beyond -l) may be buggy and
// are not supported.
//      0: disabled
//      1: 80-nodes leaf functions, oneliners, panic, lazy typechecking (default)
//      2: (unassigned)
//      3: (unassigned)
//      4: allow non-leaf functions
//
// At some point this may get another default and become switch-offable with -N.
//
// The -d typcheckinl flag enables early typechecking of all imported bodies,
// which is useful to flush out bugs.
//
// The  flag enables diagnostic output.  a single -m is useful for verifying

Let’s summarize the core part of the above passage:

  • Leaf function of 80 nodes, oneliners, panic, lazy type checkWill be inlined
  • use-N -lLet the compiler not be inlined
  • use-mEnable diagnostic output

That is to say,As long as our functions/methods are small enough, they may be inlined.Therefore, many people will use many small function combinations instead of large pieces of code to improve performance. For example, the mutex we often use (in the standard librarysyncIn the bagMutex) took advantage of this, what we usually useLockThere are only a few lines in the method:

func (m *Mutex) Lock() {
    // Fast path: grab unlocked mutex.
    if atomic.CompareAndSwapInt32(&, 0, mutexLocked) {
        if  {
            ((m))
        }
        return
    }
    // Slow path (outlined so that the fast path can be inlined)
    ()
}

Note the comments on the third last line:outlined so that the fast path can be inlined,Use this feature,LockFastPath in it can be inlined into our program without requiring additional function calls, thereby improving the performance of the code.

The entry to the function inline part is a function, If you want to have an in-depth understanding, you can go and have a look.

How much performance improvement can inline bring to my program?

I have introduced so much inline before, and even the standard library deliberately uses inline to improve the performance of Go programs. So how much performance improvement can inline bring to us?

Let's expand the example mentioned at the beginning of the article:

package user
import (
      "errors"
)
func ValidateName(name string) bool {
      if len(name) &lt; 1 {
            return false
      } else if len(name) &gt; 12 {
            return false
      }
      return true
}
//go:noinline
func ValidateNameNoInline(name string) bool {
      if len(name) &lt; 1 {
            return false
      } else if len(name) &gt; 12 {
            return false
      }
      return true
}
func (s *Server) CreateUser(name string, password string) error {
      if !ValidateName(name) {
            return ("invalid name")
      }
      return nil
}
// CreateUserNoInline uses ValidateName that prohibits inline versionsfunc (s *Server) CreateUserNoInline(name string, password string) error {
      if !ValidateNameNoInline(name) {
            return ("invalid name")
      }
      return nil
}
type Server struct{}

We copiedValidateNameFunction, marked on//go:noinlineto disable the compiler from inline optimization and rename it toValidateNameNoInline. At the same time, we copied itCreateUserMethod, new methods are used internallyValidateNameNoInlineCome to verifynameParameters, except for this, all places are the same as the original method.

Let's write two Benchmark tests:

package user
import "testing"
// BenchmarkCreateUser tests the performance of inlined functionsfunc BenchmarkCreateUser(b *) {
      srv := Server{}
      for i := 0; i &lt; ; i++ {
            if err := ("bootun", "123456"); err != nil {
                  ("err: %v", err)
            }
      }
}
// BenchmarkValidateNameNoInline test function prohibits performance after inlinefunc BenchmarkValidateNameNoInline(b *) {
      srv := Server{}
      for i := 0; i &lt; ; i++ {
            if err := ("bootun", "123456"); err != nil {
                  ("err: %v", err)
            }
      }
}

The test results are as follows:

#BenchmarkCreateUser for inline versions
goos: windows
goarch: amd64
pkg: /bootun/example/user
cpu: AMD Ryzen 7 6800H with Radeon Graphics
BenchmarkCreateUser
BenchmarkCreateUser-16          1000000000               0.2279 ns/op
PASS


# Prohibit inline version benchmark results (BenchmarkValidateNameNoInline)
goos: windows
goarch: amd64
pkg: /bootun/example/user
cpu: AMD Ryzen 7 6800H with Radeon Graphics
BenchmarkValidateNameNoInline
BenchmarkValidateNameNoInline-16        733243102                1.635 ns/op
PASS

It can be seen that each operation takes 1.6 nanoseconds after inlining is prohibited, while only 0.22 nanoseconds after inlining (varies from machine to machine). From a proportional perspective, the benefits brought by inline optimization are still considerable.

What do I need to do to enable inline optimization

Of course not required. In Go compiler, inline optimization is enabled by default. If your function complies with the inline optimization strategy mentioned in the article (such as the function is very small) and does not explicitly disable inline, it may be performed by the compiler.

In some scenarios, we may not want the function to be inlined (for example, usingdlvWhen performing DEBUG, or when viewing the assembly code generated by the program), you can usego build -gcflags='-N -l' to disable inline optimization.

The code optimized by the compiler by default may be difficult to read and understand, and is not convenient for us to debug and learn.

-gcflagsIt is passed to the go compilergccommand line flag,go buildThere are many things done behind the scenes, not only are they usedgcA program. usego build -x You can view detailed steps in the compilation process.

The above is an article that will help you learn more about inline optimization in Go. For more information about Go inline optimization, please pay attention to my other related articles!