Summary of relevant knowledge points of garbage collector

The garbage collector is a complete double-edged sword. The advantage is that the program's memory management code can be greatly simplified because memory management does not require programmers to operate, thus reducing (but not eradicating) memory leaks in long-running programs. For some programmers, it can even improve the performance of the code.

On the other hand, choosing a garbage collector means that the program cannot fully control the memory, which is the crux of mobile terminal development. For JavaScript, there is no possibility of memory management in the program—the ECMAScript standard does not expose any garbage collector interfaces. Web applications have no way to manage memory, nor can they prompt the garbage collector.

Strictly speaking, languages that use garbage collectors are not necessarily better or worse in performance than languages that do not use garbage collectors. In C language, allocating and freeing memory can be very expensive, and heap management will become more complicated in order to enable allocated memory to be released in the future. In languages that host memory, allocating memory often only adds a pointer. But then we will see the huge cost of garbage collectors getting involved in recycling when memory runs out. An unexplained garbage collector will cause long and unpredictable pauses during operation, which directly affects the use experience of the interactive system (especially with animation effects). Reference counting systems are often touted as a replacement for garbage collection mechanisms, but when the reference to the last object in a large subgraph is undesired, there will also be unpredictable pauses. Moreover, when the reference counting system frequently performs reading, rewriting, and storage operations, it will also have a considerable performance burden.

For better or worse, JavaScript requires a garbage collector. The garbage collector implementation of V8 is now mature, with excellent performance, short pauses, and very controllable performance burden.

Basic concepts

The most basic problem that garbage collectors need to solve is to identify memory that needs to be recycled. Once identified, these memory areas can be reused in future allocations or returned to the operating system. An object dies when it is not active (nonsense). An object is active if and only if it is pointed to by one root object or another active object. The root object is defined as being active and is an object referenced by the browser or V8. For example, objects pointed to by local variables belong to the root object because their stack is considered as the root object; global objects belong to the root object because they are always accessible; browser objects, such as DOM elements, also belong to the root object, although in some cases they are only weak references.

From the side, the above definition is very loose. In fact, we can say that when an object can be referenced by a program, it is active. for example:

function f() {
	 var obj = {x: 12};
	 g(); // It may contain a dead loop	 return ;
	}

def scavenge():
	 swap(fromSpace, toSpace)
	 allocationPtr = 
	 scanPtr = 

	 for i = 0..len(roots):
	 root = roots[i]
	 if inFromSpace(root):
	  rootCopy = copyObject(&allocationPtr, root)
	  setForwardingAddress(root, rootCopy)
	  roots[i] = rootCopy

	 while scanPtr < allocationPtr:
	 obj = object at scanPtr
	 scanPtr += size(obj)
	 n = sizeInWords(obj)
	 for i = 0..n:
	  if isPointer(obj[i]) and not inOldSpace(obj[i]):
	  fromNeighbor = obj[i]
	  if hasForwardingAddress(fromNeighbor):
	   toNeighbor = getForwardingAddress(fromNeighbor)
	  else:
	   toNeighbor = copyObject(&allocationPtr, fromNeighbor)
	   setForwardingAddress(fromNeighbor, toNeighbor)
	  obj[i] = toNeighbor

	def copyObject(*allocationPtr, object):
	 copy = *allocationPtr
	 *allocationPtr += size(object)
	 memcpy(copy, object, size(object))
	 return copy

During the execution of this algorithm, we always maintain pointers in two outgoing zones: allocationPtr points to where we are about to allocate memory for the new object, and scanPtr points to the next object we are about to perform active checks. The objects before the address pointed to by scanPtr are processed objects, and their neighbors are in the outgoing area, and their pointers are updated. Objects located between scanPtr and allocationPtr will be copied to the outgoing area, but if the pointers contained in these objects point to the object in the incoming area, the objects in the incoming area will not be copied. Logically, you can imagine an object between scanPtr and allocationPtr as a queue of objects used for breadth priority search.

Translator's note: In breadth-first search, the node is usually taken out from the head of the queue and expanded, and the expanded child nodes are stored at the end of the queue, and it will continue over and over again. This process is similar to the process of updating objects between two pointers.

At the beginning of the algorithm, we copy all objects that can be reached from the root object in the new area, and then enter a large loop. In each round of the loop, we delete an object from the queue, that is, to the scanPtr increment, and then track the pointer inside the access object. If the pointer does not point to the entry area, it doesn't matter, because it must point to the old area, and this is not our goal. If the pointer points to an object in the incoming area, but we have not copied it yet (the forwarding address is not set), then copy the object to the outgoing area, that is, add it to the end of our queue, which is the allocationPtr increment. At this time, we will also save a forwarding address to the first character of the outgoing object and replace the Map pointer. This forwarding address is the address stored after the object is copied. The garbage collector can easily distinguish the forwarding address from the Map pointer because the Map pointer is marked, while this address is not marked. If we find a pointer and the object it points to has been copied (the forwarding address has been set), we update the pointer to the forwarding address and mark it.

The algorithm terminates when all objects are processed (i.e. scanPtr and allocationPtr meet). At this time, the contents entered into the zone can be regarded as garbage and may be released or reused in the future.

Secret Weapon: Writing Barrier

There is a detail above that is ignored: if an object in the newborn area has only one pointer to it, and this pointer happens to be among the objects in the oldborn area, how can we know which object in the newborn area is active? Obviously, we do not want to traverse the old student area again, because there are many objects in the old student area, so it consumes too much time to do this.

To solve this problem, there is actually a list in the write buffer, which records the situation where all old-school objects point to the new-school regions. When a new object is born, there will be no pointer to it. When an object in the old-study area appears a pointer to the new-study area object, we record such cross-study area pointer. Since this recording behavior always occurs during write operations, it is called a write barrier - because every write operation goes through such a level.

You may be curious, if you have to go through a write barrier every time you perform a write operation, wouldn’t there be a lot of extra code? That's right, this is one of the costs of our garbage collection mechanism. But the situation is not as serious as you think. After all, write operations are less than read operations. Some garbage collection algorithms (not V8) use read barriers, which require hardware to assist to ensure a lower consumption. V8 also has some optimizations to reduce the consumption caused by the write barrier:

Most script execution time occurs in Crankshaft, and Crankshaft can often statically determine whether an object is in the new area. For write operations pointing to these objects, there is no need for a write barrier.

A new optimization has emerged in Crankshaft, that is, when an object does not have a non-local reference to it, the object is allocated on the stack. However, there is obviously no need for a write barrier for related write operations of objects on a stack. (Translator's note: Xinsheng District and Laosheng District are on the pile.)

The situation of "old → new" is relatively rare, so by optimizing the codes of the two common situations of "new → new" and "old → old" can relatively improve performance in most cases. Each page is aligned with 1MB, so given the memory address of an object, the page where it is located is quickly positioned by filtering out the lower 20 bits; and the page header has a relevant identifier to indicate whether it belongs to the new area or the old area. Therefore, by judging the area where the two objects belong, you can also quickly determine whether it is "old → new".

Once we find the "old → new" pointer, we can record it at the end of the write buffer. After a certain amount of time (when the write buffer is full), we sort it, merge the same items, and then remove pointers that no longer meet the "old → new". (Translator's note: In this way, the number of pointers will be reduced, and the time for writing barriers will be shortened accordingly)

"Tag-Clear" algorithm and "Tag-Close" algorithm

The Scavenge algorithm is very effective for fast recycling and shrinking chip memory, but it consumes too much for large-scale memory. Because the Scavenge algorithm requires two areas: out and incoming areas, this is still acceptable for small pieces of memory, but it begins to become unrealistic for memory that exceeds several MB. The Laosheng area contains hundreds of MB of data. For such a large area, we adopt two other algorithms that are closer to each other: the "mark-clear" algorithm and the "mark-crunch" algorithm.

Both algorithms include two phases: the marking phase, the clearing or the compacting phase.

During the tagging phase, all active objects on the heap will be tagged. Each page will contain a bitmap to mark, and each bit in the bitmap corresponds to a word in the page (translation note: a pointer is the size of a word). This marker is very necessary because the pointer may appear anywhere the word is aligned. Obviously, such bitmaps have to occupy a certain amount of space (3.1% on 32-bit systems and 1.6% on 64-bit systems), but all memory management mechanisms need to occupy this way, so this approach is not an exaggeration. In addition, there are 2 other bits to indicate the status of the tagged object. Since the object is at least 2 characters long, these bits do not overlap. There are three types of states: if an object's state is white, then it has not been discovered by the garbage collector; if an object's state is gray, then it has been discovered by the garbage collector, but its adjacent objects have not been completely processed; if an object's state is black, it is not only discovered by the garbage collector, but also all its adjacent objects have been processed.

If objects in the heap are regarded as directed graphs connected by pointers, the core of the marking algorithm is actually depth-first search. In the early stage of the markup, the bitmap is empty and all objects are white. Objects reachable from the root will be stained gray and placed into a separate assigned double-ended queue for marking. Each cycle of the marking phase, GC will remove an object from the double-ended queue, dye it black, then dye its adjacent objects into gray, and place the adjacent objects into the double-ended queue. This process ends when the double-ended queue is empty and all objects are black. Especially large objects, such as long arrays, may be sharded during processing to prevent overflow of double-ended queues. If the double-ended queue overflows, the object will still be grayed out but will not be put into the queue again (this way their neighbors will have no chance to dye again). Therefore, when the double-ended queue is empty, the GC still needs to scan once to ensure that all gray objects become black objects. For gray objects that are not blackened, GC will put them in the queue again and process them again.

Here is the pseudocode of the marking algorithm:

markingDeque = []
	overflow = false

	def markHeap():
	 for root in roots:
	 mark(root)

	 do:
	 if overflow:
	  overflow = false
	  refillMarkingDeque()

	 while !():
	  obj = ()
	  setMarkBits(obj, BLACK)
	  for neighbor in neighbors(obj):
	  mark(neighbor)
	 while overflow
	 

	def mark(obj):
	 if markBits(obj) == WHITE:
	 setMarkBits(obj, GREY)
	 if ():
	  overflow = true
	 else:
	  (obj)

	def refillMarkingDeque():
	 for each obj on heap:
	 if markBits(obj) == GREY:
	  (obj)
	  if ():
	  overflow = true
	  return

When the marking algorithm ends, all active objects are dyed black, while all dead objects are still white. This result is exactly what is expected in the two stages of cleaning and tightening.

After the marking algorithm is executed, we can choose to clean or tighten. Both algorithms can reclaim memory, and both act at the page level (note that the memory page of V8 is a continuous block of memory of 1MB, which is different from the virtual memory page).

The cleanup algorithm scans the continuously stored dead objects, turns them into free space, and adds them to the free memory list. Each page contains several free memory linked lists, which represent small memory areas (<256 words), medium memory areas (<2048 words), large memory areas (<16384 words), and super large memory areas (other larger memory). The cleanup algorithm is very simple, you just need to traverse the bitmap of the page and search for consecutive white objects. The free memory linked list is used by the scange algorithm to allocate the surviving active objects, but is also used by the compact algorithm to move objects. Some types of objects can only be allocated in old-fashioned areas, so free memory linked lists are also used by them.

The crunch algorithm tries to migrate objects from fragmented pages (pages containing a lot of small free memory) to free up memory. These objects will be migrated to another page, so some new pages may also be allocated. The fragmented pages after migration can be returned to the operating system. The process of migration and integration is very complicated, so I only mention some details without explaining them in full. Probably the process is like this. For each active object in the target fragment page, allocate an area of another page in the free memory link list, copy the object to a new page, and write the forwarding address on the object in the fragment page. During the migration process, the old address in the object will be recorded, so that after the migration is completed, V8 will traverse the address it records and update it to the new address. Since pointers between different pages are also recorded during the marking process, the pointers of these pointers will also be updated at this time. Note that if a page is very "active", such as there are too many pointers to be recorded, the address record will skip it and wait until the next round of garbage collection before processing.

Incremental marking and lazy cleaning

You should think that when a heap is large and there are many active objects, the tag-clear and tag-crunch algorithm will execute very slowly. When I first studied V8, it was not uncommon for 500-1000 millisecond pauses caused by garbage collection. This situation is obviously difficult to accept, even for mobile devices.

In mid-2012, Google introduced two improvements to reduce pauses caused by garbage collection, and had significant effects: incremental marking and lazy cleanup.

Incremental marking allows the heap to be marked during several small pauses of 5-10 milliseconds (mobile device). The incremental mark is enabled when the heap size reaches a certain threshold. After activation, every time a certain amount of memory is allocated, the execution of the script will pause and an incremental mark is performed. Just like ordinary markers, incremental markers are also a depth-first search, and use the white, gray and black mechanism to classify objects.

But the difference between incremental markers and normal markers is that the graph relationship of objects may change! We need to pay special attention to those new pointers from black objects to white objects. Recall that the black object indicates that it has been completely scanned by the garbage collector and will not perform a secondary scan. Therefore, if a pointer like "black → white" appears, we may miss the white object and deal with it as a dead object. (Translator's note: After the marking process is finished, all the remaining white objects are considered dead objects.) So we have to enable the write barrier again. Now the writing barrier not only records the "old → new" pointer, but also records the "black → white" pointer. Once such a pointer is found, the black object will be re-stained as a gray object and will be re-sented into a double-ended queue. When the algorithm takes out the object, the pointer it contains is rescanned so that the active white object will not be missed.

After the incremental marking is complete, the lazy cleanup begins. All objects have been processed, so it is a foregone conclusion that it is either dead or alive. At this time, we can not rush to free up those spaces, and it is not a big deal to delay the cleaning process. Therefore, there is no need to clean all pages at once, the garbage collector will clean up one by one as needed until all pages are cleaned up. At this time, the incremental mark is ready to go again.

Google has recently added parallel cleanup support. Since the execution thread of the script will no longer touch the dead object, the page cleaning task can be placed in another separate thread and requires very little synchronization work. The same support work is underway on parallel markings, but is still in the early stages of trial.

Summarize

Garbage recycling is really complicated. I've skipped a lot of details in the article and the article still gets very long. A colleague of mine said that he thought studying garbage collectors was more terrifying than register allocation, and I said that it was true. In other words, I would rather hand over these tedious details to the runtime than to all application developers to do. Despite some performance issues with garbage collection and occasional supernatural phenomena, it frees us from a lot of details to allow us to focus on more important things.