Deeply understand the use of go cache library freecache

Go development cache scenarios generally use map or cache frameworks, and for the sake of thread safety, a cache framework is used or thread-safe.

If the data volume in the cache scenario is greater than one million, you need to consider the impact of the data type on gc (note that the underlying layer of the string type is pointer + Len + Cap, so it is also considered a pointer type). If the cache key and value are both non-pointer types, there is no need to worry too much. However, in actual application scenarios, it is very common for key and value to be (including) pointer type data. Therefore, when using a cache framework, you need to pay special attention to its impact on gc. From the perspective of whether it affects GC, the cache framework is roughly divided into two categories:

Zero GC overhead: For example, freecache or bigcache, the underlying layer is based on ringbuf, reducing the number of pointers;
There is GC overhead: a cache framework implemented directly based on Map.

For map, gc will scan all key/value key-value pairs. If they are all basic types, gc will not scan again. The following is analyzing the implementation principle with freecache as an example. The code example is as follows:

func main() {
   cacheSize := 100 * 1024 * 1024
   cache := (cacheSize)

   for i := 0; i < N; i++ {
      str := (i)
      _ = ([]byte(str), []byte(str), 1)
   }

   now := ()
   ()
   ("freecache, GC took: %s\n", (now))
   _, _ = ([]byte("aa"))

   now = ()
   for i := 0; i < N; i++ {
      str := (i)
      _, _ = ([]byte(str))
   }
   ("freecache, Get took: %s\n\n", (now))
}

1 Initialization

The local cache will be initialized, size means the storage space size, freecache will initialize 256 segments, each segment is an independent storage unit, and the freecache locking dimension is also based on segment. Each segment has a ringbuf, and the initial size is size/256. Freecache claims that the source of zero GC is that its pointers are fixed, with only 512, and there are 2 in each segment, namely rb and slotData (note that the slice is pointer type).

type segment struct {
   rb            RingBuf // ring buffer that stores data
   segId         int
   _             uint32  // Placeholder   missCount     int64
   hitCount      int64
   entryCount    int64
   totalCount    int64      // number of entries in ring buffer, including deleted entries.
   totalTime     int64      // used to calculate least recent used entry.
   timer         Timer      // Timer giving current time
   totalEvacuate int64      // used for debug
   totalExpired  int64      // used for debug
   overwrites    int64      // used for debug
   touched       int64      // used for debug
   vacuumLen     int64      // up to vacuumLen, new data can be written without overwriting old data.
   slotLens      [256]int32 // The actual length for every slot.
   slotCap       int32      // max number of entry pointers a slot can hold.
   slotsData     []entryPtr // Index pointer}

func NewCacheCustomTimer(size int, timer Timer) (cache *Cache) {
    cache = new(Cache)
    for i := 0; i &lt; segmentCount; i++ {
        [i] = newSegment(size/segmentCount, i, timer)
    }
}
func newSegment(bufSize int, segId int, timer Timer) (seg segment) {
     = NewRingBuf(bufSize, 0)
     = segId
     = timer
     = int64(bufSize)
     = 1
     = make([]entryPtr, 256*) // Each slotData initializes 256 unit sizes}

2 Reading and writing process

The key and value of freecache are both []byte arrays, and they need to be serialized and deserialized when used. If you cache complex objects, the impact of their serialization and deserialization cannot be ignored. First, let’s look at the Set process:

_ = ([]byte(str), []byte(str), 1)

The Set process first hash the key, hashVal type uint64, its lower 8-bit segID corresponds to the segment array, the lower 8-15 bits means slotId corresponds to slotsData subscript, and the higher 16 bits means a certain data corresponding to the slotsData subscript. You need to search for it here. Note that the []entryPtr array size is slotCap (initially 1), and slotCap will multiply when expanding.

Each segment corresponds to a lock(), so it can support a large concurrency, unlike only one lock.

func (cache *Cache) Set(key, value []byte, expireSeconds int) (err error) {
   hashVal := hashFunc(key)
   segID := hashVal &amp; segmentAndOpVal // Lower 8 digits   [segID].Lock() // Add lock   err = [segID].set(key, value, hashVal, expireSeconds)
   [segID].Unlock()
}

func (seg *segment) set(key, value []byte, hashVal uint64, expireSeconds int) (err error) {
   slotId := uint8(hashVal &gt;&gt; 8)
   hash16 := uint16(hashVal &gt;&gt; 16)
   slot := (slotId)
   idx, match := (slot, hash16, key)

   var hdrBuf [ENTRY_HDR_SIZE]byte
   hdr := (*entryHdr)((&amp;hdrBuf[0]))
   if match { // There is a data update operation      matchedPtr := &amp;slot[idx]
      (hdrBuf[:], )
       = slotId
      hdr.hash16 = hash16
       = uint16(len(key))
      originAccessTime := 
       = now
       = expireAt
       = uint32(len(value))
      if  &gt;=  {
         // The existing data value space can save the value size this time         atomic.AddInt64(&amp;, int64()-int64(originAccessTime))
         (hdrBuf[:], )
         (value, +ENTRY_HDR_SIZE+int64())
         atomic.AddInt64(&amp;, 1)
         return
      }
      // Delete the corresponding entryPtr, which involves slotsData memory copy, and ringbug just marks to delete      (slotId, slot, idx)
      match = false
      // increase capacity and limit entry len.
      for  &lt;  {
          *= 2
      }
      if  &gt; uint32(maxKeyValLen-len(key)) {
          = uint32(maxKeyValLen - len(key))
      }
   } else { // No data       = slotId
      hdr.hash16 = hash16
       = uint16(len(key))
       = now
       = expireAt
       = uint32(len(value))
       = uint32(len(value))
      if  == 0 { // avoid infinite loop when increasing capacity.
          = 1
      }
   }
   
   // The actual length of the data is ENTRY_HDR_SIZE=24 + the length of key and value   entryLen := ENTRY_HDR_SIZE + int64(len(key)) + int64()
   slotModified := (entryLen, slotId, now)
   if slotModified {
      // the slot has been modified during evacuation, we need to looked up for the 'idx' again.
      // otherwise there would be index out of bound error.
      slot = (slotId)
      idx, match = (slot, hash16, key)
      // assert(match == false)
   }
   newOff := ()
   (slotId, hash16, newOff, idx, )
   (hdrBuf[:])
   (key)
   (value)
   (int64( - ))
   atomic.AddInt64(&amp;, int64(now))
   atomic.AddInt64(&amp;, 1)
    -= entryLen
   return
}

It will evaluate whether ringbuf has enough space to store key/value. If the space is insufficient, it will start scanning from the next bit at the tail of the free space (that is, the starting position of the data to be eliminated) (oldOff := () + - ()). If the corresponding data has been logically deleted or has expired, then the memory block can be recycled directly. If the recycling conditions are not met, the entry will be switched from the ring head to the end of the ring, and the entry's index will be updated. If this cycle is still not possible, then the current oldHdrBuf needs to be recycled to meet the memory needs.

The required space is definitely satisfactory after execution, and then it is written to the index and data. insertEntryPtr is a write index operation. When the number of elements in []entryPtr is greater than (initial 1), the expansion operation is required. See the corresponding method, so I will not repeat it here.

Write ringbuf to execute.

func (seg *segment) evacuate(entryLen int64, slotId uint8, now uint32) (slotModified bool) {
   var oldHdrBuf [ENTRY_HDR_SIZE]byte
   consecutiveEvacuate := 0
   for  &lt; entryLen {
      oldOff := () +  - ()
      (oldHdrBuf[:], oldOff)
      oldHdr := (*entryHdr)((&amp;oldHdrBuf[0]))
      oldEntryLen := ENTRY_HDR_SIZE + int64() + int64()
      if  { // Deleted         consecutiveEvacuate = 0
         atomic.AddInt64(&amp;, -int64())
         atomic.AddInt64(&amp;, -1)
          += oldEntryLen
         continue
      }
      expired :=  != 0 &amp;&amp;  &lt; now
      leastRecentUsed := int64()*atomic.LoadInt64(&amp;) &lt;= atomic.LoadInt64(&amp;)
      if expired || leastRecentUsed || consecutiveEvacuate &gt; 5 {
      // Can be recycled         (, oldHdr.hash16, oldOff)
         if  == slotId {
            slotModified = true
         }
         consecutiveEvacuate = 0
         atomic.AddInt64(&amp;, -int64())
         atomic.AddInt64(&amp;, -1)
          += oldEntryLen
         if expired {
            atomic.AddInt64(&amp;, 1)
         } else {
            atomic.AddInt64(&amp;, 1)
         }
      } else {
         // evacuate an old entry that has been accessed recently for better cache hit rate.
         newOff := (oldOff, int(oldEntryLen))
         (, oldHdr.hash16, oldOff, newOff)
         consecutiveEvacuate++
         atomic.AddInt64(&amp;, 1)
      }
   }
}

The Get process of freecache is relatively simple. Find the corresponding segment through hash, find the corresponding index slot through slotId, and then search for data through binary + traversal. If it cannot be found, return ErrNotFound directly, otherwise update some time indicators. The Get process will also update cache hit rate-related indicators.

func (cache *Cache) Get(key []byte) (value []byte, err error) {
   hashVal := hashFunc(key)
   segID := hashVal &amp; segmentAndOpVal
   [segID].Lock()
   value, _, err = [segID].get(key, nil, hashVal, false)
   [segID].Unlock()
   return
}
func (seg *segment) get(key, buf []byte, hashVal uint64, peek bool) (value []byte, expireAt uint32, err error) {
   hdr, ptr, err := (key, hashVal, peek) // hash+localization search   if err != nil {
      return
   }
   expireAt = 
   if cap(buf) &gt;= int() {
      value = buf[:]
   } else {
      value = make([]byte, )
   }

   (value, +ENTRY_HDR_SIZE+int64())
}

After positioning the data, just read the ringbuf. Note that generally speaking, the value read is the newly created memory space, so it involves the copy operation of []byte data.

3 Summary

Judging from the pressure measurement performance of several common cache frameworks, Set performance has a large difference, but it is not a magnitude gap, and Get performance is not large. Therefore, for most scenarios, you don’t need to pay too much attention to Set/Get performance. The focus should be on whether the functions meet business needs and the impact of GC. For performance pressure measurement, see:/posts/ch5/04-performance/

There is a special scenario for cache, that is, cache all data in memory, and when it comes to updates, they are fully updated (replaced). In this scenario, freecache is used. If the size is not set properly, some data may be eliminated, which is not in line with expectations. You must pay attention to this. In order to avoid this problem with freecache, you need to set the size "large enough", but you also need to pay attention to its memory space usage.

This is the end of this article about in-depth understanding of the use of the go cache library freecache. For more related go freecache content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!