Analysis of Golang WaitGroup implementation principle

Principle analysis

type WaitGroup struct {
   noCopy noCopy
   // 64-bit value: high 32 bits are counter, low 32 bits are waiter count.
   // 64-bit atomic operations require 64-bit alignment, but 32-bit
   // compilers only guarantee that 64-bit fields are 32-bit aligned.
   // For this reason on 32 bit architectures we need to check in state()
   // if state1 is aligned or not, and dynamically "swap" the field order if
   // needed.
   state1 uint64
   state2 uint32
}

innoCopyIt is a technology in golang source code to detect prohibited copying. If there is WaitGroup assignment behavior in the program, usego vetWhen checking the program, you will find an error. But it should be noted that noCopy will not affect the normal compilation and operation of the program.

state1 field

The higher 32 bits are counter, representing the number of coroutines that have not been completed yet.
The lower 32 bits are waiter, which means that it has been calledWaitThe number of goroutines, becausewaitCan be called by multiple coroutines.

state2 is a semaphore.

The entire calling process of WaitGroup can be simply described as follows:

When called(n)When counter will increase:counter + n
When called()When thewaiter++. Called simultaneouslyruntime_Semacquire(semap), Increase the semaphore and hang the current goroutine.
When called()When, it willcounter--. If the counter after decreasing is equal to 0, it means that the WaitGroup wait process has ended, and it needs to be calledruntime_SemreleaseRelease the semaphore and wake upgoroutine.

About memory

func (wg *WaitGroup) state() (statep *uint64, semap *uint32) {
	if (wg.state1) == 8 || uintptr((&wg.state1))%8 == 0 {
		// state1 is 64-bit aligned: nothing to do.
		return &wg.state1, &wg.state2
	} else {
		// state1 is 32-bit aligned but not 64-bit aligned: this means that
		// (&state1)+4 is 64-bit aligned.
		state := (*[3]uint32)((&wg.state1))
		return (*uint64)((&state[1])), &state[0]
	}
}

If the variable is 64-bit aligned (8 byte), the starting address of the variable is a multiple of 8. If the variable is 32-bit aligned (4 byte), the start address of the variable is a multiple of 4.

whenstate1When it's 32, thenstate1Being regarded as an array[3]uint32, the first bit of the array issemap, the second and third bits are storedcounter, waiterIt's exactly 64.

Why is there such a strange setting? There are two prerequisites involved:

Prerequisite 1: In the real logic of WaitGroup, counter and waiter are combined and used externally as a 64-bit integer. When changing the values of counter and waiter, the 64-bit integer is also used to atomically operate the 64-bit integer.

Prerequisite 2: In a 32-bit system, if atomic is used to perform atomic operations on 64-bit variables, the caller needs to ensure the 64-bit alignment of the variables by himself, otherwise an exception will occur. golang's official documentationsync/atomic/#pkg-note-BUGThe original text says this:

On ARM, x86-32, and 32-bit MIPS, it is the caller’s responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a variable or in an allocated struct, array, or slice can be relied upon to be 64-bit aligned.

Therefore, in the case of premise 1, WaitGroup needs to perform atomic operations on 64 bits. According to premise 2, WaitGroup needs to ensure it itselfcount+waiter64-bit alignment of .

This method is very clever, but it's just a changesemapThe order of position can be guaranteedcounter+waiterIt will definitely be 64-bit aligned, which can also ensure efficient memory utilization.

Note: Some articles will talk about the difference between WaitGroup's two different memory layout methods: 32-bit system and 64-bit system, which is actually not very rigorous. To be precise, the difference between 32-bit alignment and 64-bit alignment. Because in a 32-bit system,state1It is also possible that the variables happen to fit 64-bit alignment.

existThere is no memory operation on it in the source code, although it also has a large number of atomic operations, that is becausestate int32。

existThere is also a variable address in four states. In fact, the purpose of this is to realize atomic operations, because there is no way to modify multiple variables at the same time and ensure atomicity.

WaitGroup directlycounterandwaiterIt is regarded as a unified 64-bit variable. incounteris the 32-bit higher of this variable.waiteris the lower 32 bits of this variable. If you need to changecounterWhen , shift the accumulated value to 32 bits left.

The atomic operation here does not use locks like Mutex or RWMutex, mainly because the lock will bring considerable performance losses and there is context switching. The best way to operate atomic at a single memory address is atomic, because this is supported by the underlying hardware (CPU instructions), with smaller granularity and higher performance.

Source code part

func (wg *WaitGroup) Add(delta int) {
    // () returns the address	statep, semap := ()
    // Atomic operation, modify the value of statep with a higher 32 bits, that is, the value of counter	state := atomic.AddUint64(statep, uint64(delta)&lt;&lt;32)
    // Move right 32 bits to turn the high 32 bits into the low 32 bits, and get the counter value	v := int32(state &gt;&gt; 32)
    // Directly take the lower 32 bits to get the waiter value	w := uint32(state)
	// Improper operation	if v &lt; 0 {
		panic("sync: negative WaitGroup counter")
	}
    // Improper operation	if w != 0 &amp;&amp; delta &gt; 0 &amp;&amp; v == int32(delta) {
		panic("sync: WaitGroup misuse: Add called concurrently with Wait")
	}
    // This is normal	if v &gt; 0 || w == 0 {
		return
	}
    // What's left is the case where counter == 0 and waiter != 0    // In this case, the value of *statep is the value of waiter, otherwise there will be problems    // In this case, all tasks have been completed, and *statep can be set to 0    // Release semaphores to all Waiters at the same time	// This goroutine has set counter to 0 when waiters &gt; 0.
	// Now there can't be concurrent mutations of state:
	// - Adds must not happen concurrently with Wait,
	// - Wait does not increment waiters if it sees counter == 0.
	// Still do a cheap sanity check to detect WaitGroup misuse.
	if *statep != state {
		panic("sync: WaitGroup misuse: Add called concurrently with Wait")
	}
	// Reset waiters count to 0.
	*statep = 0
	for ; w != 0; w-- {
		runtime_Semrelease(semap, false, 0)
	}
}

func (wg *WaitGroup) Done() {
	(-1)
}

func (wg *WaitGroup) Wait() {
    // () returns the address	statep, semap := ()
    // The for loop is in conjunction with CAS operation	for {
		state := atomic.LoadUint64(statep)
		v := int32(state &gt;&gt; 32) // counter
		w := uint32(state) // waiter
        // If counter is 0, it means that all tasks have been completed when calling Wait. Exit directly        // This requires that Add() must be called in synchronization, otherwise Wait may exit first		if v == 0 {
			return
		}
		// waiter++, atomic operation		if atomic.CompareAndSwapUint64(statep, state, state+1) {
            // If the self-increase is successful, the semaphore is obtained, and the semaphore plays a synchronous role here.			runtime_Semacquire(semap)
			return
		}
	}
}

To sum up, the principle of WaitGroup is only five points: memory alignment, atomic operation, counter, waiter, and semaphore.

Memory alignment is used for atomic operations.
The counter increases and decreases use atomic operations. The counter's function is to release all semaphores once it is 0.
The self-increment of the waiter uses atomic operations. The function of the waiter is to indicate how much semaphore to be released.

This is the end of this article about the principle of Golang WaitGroup implementation. For more related Go WaitGroup content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!