Detailed explanation of the example of using CAS to implement lock-free algorithm in C#

The basic concept of CAS

CAS (Compare-and-Swap) is a commonly used atomic operation in multi-threaded concurrent programming, used to implement synchronization and mutually exclusive access between multiple threads. It usually contains three parameters: a memory address (usually the address of a shared variable), the expected old value and the new value.

CompareAndSwap (memory address, expected old value, new value)

The CAS operation compares whether the value at the memory address is equal to the expected old value, and if it is equal, writes the new value to the memory address; if it is not equal, no operation is performed. This comparison and exchange operation is an atomic operation and will not be interrupted by other threads.

CAS is usually implemented through hardware-level CPU instructions, and its atomicity is guaranteed by the hardware. The specific implementation method will vary depending on the environment.

A CAS operation usually has a return value to indicate whether the operation is successful. The result may be true or false, or it may be the old value at the memory address.

Compared with the traditional locking mechanism, CAS has some advantages:

Atomicity: CAS operations are atomic and do not require additional locks to ensure data consistency in multi-threaded environments, avoiding the performance overhead and competition conditions brought by locks.
No blocking: CAS operations are non-blocking and will not cause thread blocking and context switching due to resource locking, improving the concurrency and scalability of the system.
Applicability: CAS operations can be applied to a wide range of data structures and algorithms, such as spin locks, counters, queues, etc., making it more flexible and applicable in practical applications.

How to use CAS in C#

In C#, we can use the Interlocked class to implement CAS operations.

The Interlocked class provides a set of overloaded methods of CompareExchange to implement CAS operations for different types of data.

public static int CompareExchange(ref int location1, int value, int comparand);
public static long CompareExchange(ref long location1, long value, long comparand);
// ...Other overloading methods are omittedpublic static object CompareExchange(ref object location1, object value, object comparand);
public static T CompareExchange&lt;T&gt;(ref T location1, T value, T comparand) where T : class;

The CompareExchange method compares the value at the location1 memory address with the comparison, and if it is equal, writes the value to the location1 memory address, otherwise no operation is performed.
This method returns the value at the location1 memory address.

By judging whether the return value of the method is equal to the comparison, we can know whether the CompareExchange method is successfully executed.

Algorithm Example

When using CAS to implement lock-free algorithms, we usually not only compare and update a data, but also need to perform the next operation after the update is successful. Combined with the while(true) loop, we can constantly try to update the data until the update is successful.

The pseudo-code is as follows:

while (true)
{
    // Read data    oldValue = ...;
    // Calculate new value    newValue = ...;
    // CAS updates data    result = CompareExchange(ref location, newValue, oldValue);
    // Determine whether CAS is successful    if (result == oldValue)
    {
        // CAS is successful, perform subsequent operations        break;
    }
}

In complex lock-free algorithms, because each step of operation is independent and continuous operations are not atomic, we should not only use CAS, but also determine whether other threads have modified the data before each step of operation.

Example 1: Counter

Below is a simple counter class that uses CAS to implement a thread-safe self-increment operation.

public class Counter
{
    private int _value;

    public int Increment()
    {
        while (true)
        {
            int oldValue = _value;
            int newValue = oldValue + 1;
            int result = (ref _value, newValue, oldValue);
            if (result == oldValue)
            {
                return newValue;
            }
        }
    }
}

In the underlying source code of CLR, we often see such code, such as ThreadPool's counter when increasing threads./dotnet/runtime/blob/release/6.0/src/libraries//src/System/Threading/#L446

internal void EnsureThreadRequested()
{
    //
    // If we have not yet requested #procs threads, then request a new thread.
    //
    // CoreCLR: Note that there is a separate count in the VM which has already been incremented
    // by the VM by the time we reach this point.
    //
    int count = _separated.numOutstandingThreadRequests;
    while (count < )
    {
        int prev = (ref _separated.numOutstandingThreadRequests, count + 1, count);
        if (prev == count)
        {
            ();
            break;
        }
        count = prev;
    }
}

Example 2: Queue

Below is a simple queue class that uses CAS to implement a thread-safe incoming and dequeuing operations. Compared with the counter above, the operation here is more complicated. We need to consider whether other threads have modified the data at each step.

This algorithm is a bit like Schrödinger's cat. You don't know whether it is dead or alive. Only when you try to observe it can it become dead or alive.

public class ConcurrentQueue&lt;T&gt;
{
    // _head and _tail are two pseudo-nodes, _head._next points to the first node of the queue, and _tail points to the last node of the queue.    // _head and _tail will be modified and accessed by multiple threads, so they need to be modified with volatile.    private volatile Node _head;
    private volatile Node _tail;
    
    public ConcurrentQueue()
    {
        _head = new Node(default);
        // When _tail points to _head, the queue is empty.        _tail = _head;
    }

    public void Enqueue(T item)
    {
        var node = new Node(item);
        while (true)
        {
            Node tail = _tail;
            Node next = tail._next;
            // Determine whether other threads have modified the time when the value is assigned to next            if (tail == _tail)
            {
                // If next is null, it means that no other thread has modified tail._next from the time of assigning tail to assignment to next.                if (next == null)
                {
                    // If tail._next is null, it means that since the tail is assigned here, no other thread has modified tail._next,                    // tail is still the last node in the queue, so we can directly assign node to tail._next.                    if ((ref tail._next, node, null) == null)
                    {
                        // If _tail == tail, it means that from the previous CAS operation, no other thread has modified _tail, that is, no other thread has performed the Enqueue operation.                        // Then the node of the current thread Enqueue is the last node of the queue, and we can directly assign node to _tail.                        (ref _tail, node, tail);
                        break;
                    }
                }
                // If next is not null, it means that other threads have modified tail._next from the time of assigning tail to assigning next.                else
                {
                    // If no other thread has modified _tail, then next is the last node of the queue, and we can directly assign next to _tail.                    (ref _tail, next, tail);
                }
            }
        }
    }

    public bool TryDequeue(out T item)
    {
        while (true)
        {
            Node head = _head;
            Node tail = _tail;
            Node next = head._next;
            // Determine whether _head has been modified            // If it has not been modified, it means that no other thread has performed the Dequeue operation from the time of assigning value to head to next.            if (head == _head)
            {
                // If head == tail, the queue is empty                if (head == tail)
                {
                    // Although the above has determined whether the queue is empty, we will judge it again here                    // This is to prevent other threads from performing Enqueue operations during the period when assigning tail to assigning next.                    if (next == null)
                    {
                        item = default;
                        return false;
                    }

                    // If next is not null, it means that other threads have modified tail._next from the time when assigning tail to assigning next, which means that other threads have performed Enqueue operations.                    // Then next may be the last node of the queue, and we try to assign next to _tail.                    (ref _tail, next, tail);
                }
                // If head != tail, it means that the queue is not empty                else
                {
                    item = next._item;
                    if ((ref _head, next, head) == head)
                    {
                        // If _head has not been modified                        // Instruction: Since assigning the head value to this point, no other thread has performed the Dequeue operation. The item above is the value of the first node of the queue.                        // We can return directly.                        break;
                    }
                    // If _head has been modified                    // Instruction: From assigning the head value to this point, other threads have performed the Dequeue operation, and the item above is not the value of the first node of the queue.                    // We need to re-execute the Dequeue operation.                }
            }
        }

        return true;
    }

    private class Node
    {
        public readonly T _item;
        public Node _next;

        public Node(T item)
        {
            _item = item;
        }
    }
}

We can test it with the following code

using ;

var queue = new ConcurrentQueue&lt;int&gt;();
var results = new ConcurrentBag&lt;int&gt;();
int dequeueRetryCount = 0;

var enqueueTask = (() =&gt;
{
    // Make sure that the dequeueTask has started running before Enqueue    (10);
    ("Enqueue start");
    (0, 100000, i =&gt; (i));
    ("Enqueue done");
});

var dequeueTask = (() =&gt;
{
    (10);
    ("Dequeue start");
    (0, 100000, i =&gt;
    {
        while (true)
        {
            if ((out int result))
            {
                (result);
                break;
            }

            (ref dequeueRetryCount);
        }
    });
    ("Dequeue done");
});

await (enqueueTask, dequeueTask);
(
    $"Enqueue and dequeue done, total data count: {}, dequeue retry count: {dequeueRetryCount}");

var hashSet = ();
for (int i = 0; i &lt; 100000; i++)
{
    if (!(i))
    {
        ("Error, missing " + i);
        break;
    }
}

("Done");

Output result:

Dequeue start
Enqueue start
Enqueue done
Dequeue done
Enqueue and dequeue done, total data count: 100000, dequeue retry count: 10586
Done

The above retry count is 797, which means that among the 100,000 Dequeue operations, 10,586 Dequeue operations need to be retryed. That is because in the Dequeue operation, there may be no data available for Dequeue, and you need to wait for other threads to perform the Enqueue operation.

Of course, this retry count is unstable, because in a multi-threaded environment, the results of each execution may be different.

Summarize

CAS operation is an optimistic lock. It assumes that no other thread has modified the data. If it has not been modified, then modify the data directly. If it has been modified, then re-acquire the data and try to modify it again.

When implementing more complex data structures with CAS, we not only need to rely on CAS operations, but also need to pay attention to whether the data of each operation has been modified by other threads, considering each possible branch, and how to process the data in different branches.

The above is a detailed explanation of the examples of using CAS to implement lock-free algorithms in C#. For more information about C# CAS lock-free algorithms, please pay attention to my other related articles!