Detailed explanation of the example of the underlying implementation scenario of Golang

introduction

Ordinary maps in Go are non-thread-safe. If you want to access a map thread-safely, there are two ways: map+mutex and the other is native. This article will introduce in detail how the underlying layer is implemented and some commonly used scenarios.

How to ensure thread safety?

The data structure is as follows:

type Map struct {
   mu Mutex // Lock to ensure thread safety of writing operations and dirty promotion to read   read  // readOnly read-only map   dirty map[any]*entry // Dirty map, when there is data inside, it must contain the data in the read   misses int // Read misses, when a certain number of times is reached, the protective gear in dirty will be promoted to read}

If we only look at this structure, we may have the following questions:

I also used mutex. Isn’t the implementation method of map+mutex the same?
What are misses used for?
The type of read is aAnd dirty ismap[any]*entry, Why is it different?

I also used mutex. Isn’t the implementation method of map+mutex the same?

In essence, it is implemented through map+mutex implementation
By increasing the read map, the lock probability is reduced when performing a read operation and the read performance is increased.

What are misses used for?

Misses is used to mark the number of misses in read
When misses reaches a certain value, the promotion of dirty will be triggered (promoted to read)

The specific source code is as follows:

// When the key is not hit in the read while performing the Load operation, a miss record is performedfunc (m *Map) missLocked() {
   // Add 1 to count   ++
   // 2. Determine whether dirty meets the promotion conditions   if  &lt; len() {
      // 2.1 returns directly if it is not satisfied      return
   }
   // 3. Transfer the data in dirty to the m of the read, and the data in the old read is discarded   (readOnly{m: })
   // 4. Clear dirty    = nil
   // 5. Reset the miss count    = 0
}

From the code you can see:

When the misses value is greater than or equal to the number of data in dirty, the promotion of dirty will be triggered
When the dirty promotion is promoted, the read is reset to a newly generated readOnly, where m is the new dirty, and amended is the default value false, ensuring that the amended is automatically set to false every time the promotion is triggered.
When the dirty promotion, no copy of the data was triggered

The type of read is aAnd dirty is`map[any]*entry`, Why is it different?

type readOnly struct {
   m       map[any]*entry // Data in read map   amended bool // Mark whether there is a key in the dirty map that is not in the read. If so, this value is true}
type entry struct {
   p  // *interface{} A pointer to specific data}

The underlying layer of read type is the stored readOnly type, and the readOnly type is justmap[any]*entryAdded aamendedmark
If amended is false, it means that there is no data in the dirty. At this time, a dirty operation can be avoided (locking will be added), thereby reducing meaningless locking.
read is declared asType is to satisfy the data consistency when multiple Goroutines read read simultaneously in the case of lock-free

Which scenarios are suitable for?

It is more suitable for scenarios where more reads, writes less, and when maps need to be frequently written, the map+mutex solution can achieve better performance by controlling the lock force.

Traversal operations are not supported because the design of read and write separation causes some unfinished modification operations during the traversal process, resulting in uncertain traversal results.

Why is it suitable for scenarios where more reads, less writes?

The reading method isLoadMethod, the specific source code implementation is as follows:

func (m *Map) Load(key any) (value any, ok bool) {
   // 1. Force the data in read to readOnly   read, _ := ().(readOnly)
   // 2. Query the key from the read to check whether the data exists   e, ok := [key]
   // 3. If the read does not exist and the amended mark shows that there is a key that is not in the read in dirty, go to dirty to query   if !ok &amp;&amp;  {
      // 3.1 starts operating dirty, lock is required to ensure thread safety      ()
      // 3.2 Check again from read to avoid the data in dirty before Lock execution triggering the promotion to read operation      read, _ = ().(readOnly)
      e, ok = [key]
      // 3.3 Same as 3      if !ok &amp;&amp;  {
         // 3.4 Query from dirty         e, ok = [key]
         // 3.5 Whether or not the data is queried from dirty, it is equivalent to missing from read. The miss count needs to be updated (updating the count may trigger the promotion of dirty data)         ()
      }
      // 3.5 Operation is completed and unlocked      ()
   }
   // 4. The detection result, ok is false means that no data was found   // ok is true in two situations: 1. Query the data from read, and the read hits; 2. Query the data from dirty   // ok is false is divided into two situations:   // No hit, but false   // No hit is true, but it does not exist in dirty either   if !ok {
      return nil, false
   }
   // Return the query data (this value does not necessarily exist, you need to determine whether it is a normal value or a nil based on p)   return ()
}
// When you get a pointer to the value of a key from read or dirty, you need to load the value of the corresponding pointerfunc (e *entry) load() (value any, ok bool) {    
   // 1. Get the value of the corresponding address in the yard operation   p := (&amp;)
   // 2. If the value already does not exist or is marked as deleted, return nil, false   if p == nil || p == expunged {
      return nil, false
   }
   // 3. Return the specific value, true   return *(*any)(p), true
}

Each time data is read, read from the read first, and the reading of data in the read does not require locking; when the read missed in the read and the amended mark shows that there is data not in the read, a dirty query is performed and locking is added. When there are more reads and fewer writes, most of the time the data is in the read, so locking can be avoided to improve the performance of concurrent reads.

The writing operation method isStoremethod

func (m *Map) Store(key, value any) {
   // 1. Query from read whether the data already exists, and try to modify it if it exists   read, _ := ().(readOnly)
   if e, ok := [key]; ok &amp;&amp; (&amp;value) {
      // 1.1 The data exists in read and the update is completed. Return directly      return
   }
   //Not in it, prepare to operate dirty to add locks to ensure thread safety   ()
   read, _ = ().(readOnly)
   if e, ok := [key]; ok {
   // 3. If the data is queried in the read, check whether it has been marked as deleted.      if () {
      // 3.1 If marked as deleted, you need to clear the mark and add it to dirty         [key] = e
      }
      // 3.2 Update the entry's value      (&amp;value)
   } else if e, ok := [key]; ok {
      // 4 If it exists in dirty, the value of entry is directly updated      (&amp;value)
   } else {
   // 5 If the previous key does not exist, add the new key-value to dirty      if ! {
      // 5.1 If the tag shows that there is no key in read before dirty, reset dirty and mark amended to true         // 5.1.1 Re-add all entry in the read that is not marked as deleted to dirty         ()
         // 5.1.2 Update tags         (readOnly{m: , amended: true})
      }
      // 5.2 Add new key-value to dirty      [key] = newEntry(value)
   }
   ()
}
// Determine whether entry is marked as deleted, if it is modified to nilfunc (e *entry) unexpungeLocked() (wasExpunged bool) {
   return (&amp;, expunged, nil)
}

Write operations are divided into the following situations:

Data in read
Data is in read but has been marked as deleted
Data in dirty
A brand new data

When the data is in read, the Store will try to modify the data through atomic operations. If the atomic operation is successful, it is equivalent to the completion of data update;

The specific code is as follows:

func (e *entry) tryStore(i *any) bool {
   for {
       // 1. Get the specific value of entry      p := (&amp;)
      // 2. If the data has been deleted by mark, return false      if p == expunged {
         return false
      }
      // 3. Update the current value and return true      if (&amp;, p, (i)) {
         return true
      }
   }
}

When the data has been marked as deleted in the read, the entry needs to be added to the dirty and updated the value (here, the new entry is essentially added, just taking the address space of the previous entry)

When the data is in dirty, the pointer of the entry is directly updated through atomic operations;

When the data is a new data, a new entry will be created and added to the dirty, and if it is the first data in the dirty, it will be executeddirtyLockedMethod, add the current data (unmarked deleted) in the read to dirty.

The specific implementation of dirtyLocked is as follows:

func (m *Map) dirtyLocked() {
    //1 Reset operation can only be performed if dirty is nil before   if  != nil {
      return
   }
   // 2 Get the data in the read   read, _ := ().(readOnly)
   // 3 Initialize dirty    = make(map[any]*entry, len())
   // 4 traversal read   for k, e := range  {
      // 4.1 Add objects that are not nil and are not marked as deleted to dirty      if !() {
         [k] = e
      }
   }
}
// Determine whether the entry is marked as deleted. If the entry value is nil, mark it as deletedfunc (e *entry) tryExpungeLocked() (isExpunged bool) {
    // 1 Get the value of entry   p := (&amp;)
   for p == nil {
       // 2 If the value of the current entry is empty, try to mark this key as delete      if (&amp;, nil, expunged) {
         return true
      }
      p = (&amp;)
   }
   // 3 Determine whether p is marked as deleted   return p == expunged
}

Through the above analysis, we can find that when writing operations are frequent, there are the following problems

There is a large amount of data in dirty, and the read query will be likely to be unable to hit, which will cause the query to query the two maps of read and dirty and there are additional redundant operations, so the read performance is greatly reduced.
Frequent unhooks lead to the promotion of dirty data. Although the promotion is only pointer switching and dirty clearing, the first write after each promotion will cause dirty to copy the read, greatly reducing performance.
Each write operation requires additional checks and operations in order to be uncertain whether the data is on read or dirty or new data.
In some cases, data duplication exists in dirty and read, and the memory usage will be higher.

In summary, when writing operations are frequent, the performance in all aspects is greatly reduced; and for some data that only rarely write operations (such as table data that is loaded once at the server startup), the performance of concurrent operations can be improved.

How to delete data

In the above dirtyLoacked method we see that after initializing dirty, the data in the read will be traversed and objects that are not nil and are not marked as deleted are added to the dirty. From this we can see that the data in the read will not be deleted immediately when deleted, just mark the object as nil or expunged.

The specific code is as follows: (The Delete method is essentially the execution of LoadAndDelete)

func (m *Map) LoadAndDelete(key any) (value any, loaded bool) {
   read, _ := ().(readOnly)
   // 1 Query data from read   e, ok := [key]
   // 2 If the read does not exist, and the amended mark shows that there is a key that is not in the read in dirty, go to dirty to query   if !ok &amp;&amp;  {
      // 3.1 starts operating dirty, lock is required to ensure thread safety      ()
      read, _ = ().(readOnly)
      // 3.2 Check again from read to avoid the data in dirty before Lock execution triggering the promotion to read operation      e, ok = [key]
      if !ok &amp;&amp;  {
         // 3.3 Query from dirty         e, ok = [key]
         // 3.4 Remove from dirty         delete(, key)
         // 3.5 Whether or not the data is queried from dirty, it is equivalent to missing from read. The miss count needs to be updated (the dirty promotion will be triggered after the condition is met)         ()
      }
      ()
   }
   // 4 Like load, here ok is true, which may be to read data in read or dirty.   // Although it has been deleted in dirty, you need to clear the pointer p in the entry   if ok {
      // 5 Tag Delete      return ()
   }
   return nil, false
}

As shown in the above code5As shown in the section, no matter where the entry exists, it is ultimately necessary to mark the entry as deleted. If there is a read, it will not be added to the dirty when the dirty is initialized, and the data in the read will be discarded when the dirty is promoted again. If there is a dirty, the data is directly cleared and the entry is marked to be deleted.

Range method

It does not support traversal, but provides a Range method, which is not the samerangeThe same keyword traversal of map.

The specific functions of the Range method:

Iterate through all elements in read and execute function f on each element in it
If any element executes function f as an argument, the traversal will be interrupted immediately

Although Range will promote dirty data once in the initial stage of execution, it still cannot guarantee that there will be no new data during the execution process, so Range only traverses the data in the latest read, not all the data.

//Travelfunc (m *Map) Range(f func(key, value any) bool) {
   read, _ := ().(readOnly)
   // 1 If there is unpromoted data, then perform a dirty data promotion first.   if  {
      ()
      read, _ = ().(readOnly)
      if  {
         read = readOnly{m: }
         (read)
          = nil
          = 0
      }
      ()
   }
   // 2 traverse all entry in read and execute f functions separately   for k, e := range  {
      v, ok := ()
      if !ok {
         continue
      }
      // 3 When an entry executes the f function and returns false, the traversal will be interrupted      if !f(k, v) {
         break
      }
   }
}

other

Q: When to clear the value that is markedly deleted

Answer: When data is stored in dirty for the first time, dirty will be triggered to copy the contents in the read. When copying again, only non-nil and unmarked deleted entry is copied. When dirty is promoted again, the data in the read is overwritten, realizing the deletion of the tagged deleted entry.

The above is the detailed explanation of the example of Golang's underlying implementation scenario. For more information about Golang's underlying implementation, please pay attention to my other related articles!