Detailed explanation of the storage structure of LinkedList in C#

This article follows the previous three articles on data structure analysis of C#. Another article on data structure analysis of C# is about to be temporarily concluded. This series mainly introduces and analyzes different types from 5 types including Array, List, Dictionary, LinkedList, SortedSet, etc. Without further ado, let’s take a look at the last data type “linked list” of this series.

When it comes to the data structure of linked lists, most students may not be unfamiliar with it, but there may not be many students using LinkedList in .NET, because in most scenarios, most students will directly use List and Dictionary data structures. This time we will use this article to gain a comprehensive understanding of the LinkedList collection of .NET.

This article will briefly interpret the basic characteristics of the linked list, the underlying implementation logic of LinkedList in C#, and the reason analysis of different implementation methods of Queue from different perspectives.

1. Basic characteristics of linked lists

Arrays require a continuous piece of memory space to store, and the memory requirements are relatively high. A linked list does not require a continuous piece of memory space, and a group of scattered memory blocks are connected in series through "pointers". The nodes of the linked list can dynamically allocate memory, so that the size of the linked list can change dynamically as needed without being restricted by a fixed memory size. Especially when frequent insertion and deletion operations are required, linked lists have better performance than arrays. The most common linked list structures are: single linked list, bidirectional linked list and circular linked list.

1. The basic unit of the linked list is a node, and each node contains two parts:

(1) Data: storage information contained in the node.

(2) Reference (Next): A reference to the next node, in a bidirectional linked list, contains a reference to the previous node.

2. The basic types of linked lists mainly include three types:

(1) Single-linked list: Each node contains only one reference to the next node.

(a), [Time Complexity] Head insertion/deletion: O(1); tail insertion: O(n); middle insertion/deletion: O(n).

(b), [Time Complexity] Search by value: O(n) (need to traverse the entire linked list); Search by index: O(n).

(c), [Space Complexity] Insert and Delete: O(1); Find: O(1).

(2) Double Linked List: Each node contains two references, one pointing to the next node and one pointing to the previous node.

(a), [Time Complexity] Head insertion/deletion: O(1); tail insertion/deletion: O(1); middle insertion/deletion: O(n).

(b), [Time Complexity] Search by value: O(n); Search by index: O(n).

(c), [Spatial Complexity] O(n).

(3) Circular link list: The reference of the tail node points to the head node, forming a closed loop.

(a), [Time Complexity] Head insertion/deletion: O(1); tail insertion/deletion: O(1); middle insertion/deletion: O(n).

(b), [Time Complexity] Search by value: O(n); Search by index: O(n).

(c), [Spatial Complexity] O(n).

The above briefly introduces the basic characteristics, classification, corresponding time complexity and spatial complexity of linked lists. Although double-linked lists consume relatively memory, they have obvious priority over single-linked lists in terms of insertion, deletion, and ordered linked list query. This fully reflects the algorithmic design idea of "using space for time".

2. LinkedList data storage

LinkedList is a double linked list implementation provided in C# for storing elements. Each node in a bidirectional linked list contains a reference to the previous node and the next node. This structure makes it more convenient to traverse and operate in both directions in the linked list.

1. Node structure

public sealed class LinkedListNode<T>
    {
        internal LinkedList<T>? list;
        internal LinkedListNode<T>? next;
        internal LinkedListNode<T>? prev;
        internal T item;
        ...
        public LinkedListNode(T value)
        {
            Value = value;
            Previous = null;
            Next = null;
        }
    }

The above code shows the storage structure of the node in the LinkedList in C#, representing a node in the bidirectional linked list. Each node in LinkedList is an object containing the element value and two references. A list is a reference to a LinkedList containing the node. This reference allows nodes to access some information in the linked list, such as head nodes, tail nodes, etc. next is a reference to the next node. prev is a reference to the previous node. item stores the value of the node.

In fact, when you see this place, some students may have questions about why the data structure of this node is not designed as a "structural", but is designed as a class, and the structure has more advantages in memory usage. Why is designed here? There may be several comprehensive considerations.

1. Reference semantics: An instance of a type has reference semantics. When an object is passed or assigned, the object is passed or assigned, and the modification of the same object is visible in all references to the object.

2. Complexity and life cycle: If a type has a more complex life cycle or contains references to other resources (such as other objects, file handles, etc.), a class is usually selected instead of a structure. Structures are suitable for lightweight, simple value types, while classes are more suitable for handling more complex and cited semantics.

3. Nullability: Classes can use null to represent empty references, but structures cannot.

4. Performance and copy overhead: Structures are usually copied, and classes are passed through references.

The above structural design complexity is not high. From the perspective of overall design, we consider the structural design as "structure" or "class". Which one has more advantages? In the future system development process, we also need to think comprehensively. No structure is perfect, and each structure has its targeted advantages.

2. The beginning and end of the link list

public class LinkedList<T> : ICollection<T>, ...
{
    public LinkedListNode<T> First { get; }
    public LinkedListNode<T> Last { get; }
    ...
}

LinkedList itself maintains references to the head and tail of the linked list, pointing to the first node (head node) and the last node (tail node) respectively. By using the linked list node (LinkedListNode) as a private member of the LinkedList class, the implementation details of the linked list node can be hidden to provide better encapsulation. External users only need to pay attention to the public interface of the linked list without understanding the specific structure of the node. And it can be more easily extended and maintained for the functions of the linked list, can control access rights to nodes, operations on the linked list will affect all references to the same linked list, and can represent empty linked lists.

3. Read and write LinkedList data

In the above, I analyzed the storage structures of linked lists LinkedListNode and LinkedList. Next, let’s take a look at the implementation logic of basic operations such as maintenance and query of LinkedList elements in linked lists. First, let’s take a look at the addition operation of elements. The Add() method is used to add an element to the collection. The core implementation method inside is AddLast(). Let’s take a look at the internal implementation of this method in detail. [The source code has been partially deleted].

public LinkedListNode&lt;T&gt; AddLast(T value)
        {
            LinkedListNode&lt;T&gt; result = new LinkedListNode&lt;T&gt;(this, value);
            //Distinguish between scenes where the linked list is empty and non-empty            if (head == null)
            {
                InternalInsertNodeToEmptyList(result);
            }
            else
            {
                InternalInsertNodeBefore(head, result);
            }
            return result;
        }

The above code shows the implementation code of AddLast(). This method is to add a new node operation at the end of the bidirectional linked list, and adopt different insertion strategies based on whether the linked list is empty, ensuring the effectiveness of the insertion operation, and returning a reference to the new insertion node. Here, the difference between empty and non-empty scenes is because in the bidirectional linked list, the previous node of the head of the head is the tail node, and the next node of the tail node is the head node. Therefore, when the linked list is empty, the head node is the tail node, and you can directly insert the new node. When the linked list is not empty, a new node needs to be inserted before the head node to keep the head and tail of the linked list connected. Next, let’s take a look at the InternalInsertNodeToEmptyList() and InternalInsertNodeBefore() methods respectively.

private void InternalInsertNodeToEmptyList(LinkedListNode&lt;T&gt; newNode)
        {
            // Used to ensure that the linked list must be empty when this method is called.            (head == null &amp;&amp; count == 0, "LinkedList must be empty when this method is called!");
            //Point the next of the new node to itself             = newNode;
            //Point the prev of the new node to itself             = newNode;
            //Point the head node of the linked list to the new node            head = newNode;
            //Add the version number of the linked list            version++;
            //Increase the number of nodes in the linked list            count++;
        }

InternalInsertNodeToEmptyList() implements the logic of inserting new nodes into empty linked lists. In an empty linked list, the new node is the only node, so its next and prev both point to itself. The new node is both the head and tail nodes.

private void InternalInsertNodeBefore(LinkedListNode&lt;T&gt; node, LinkedListNode&lt;T&gt; newNode)
        {
            //The next reference of the new node newNode points to the target node node,            //Make sure the next of the new node newNode points to the original position in the linked list.             = node;
            //The prev reference of the new node newNode points to the previous node of the target node,            //Keep the connection relationship of the linked list during the insertion operation to ensure that the previous node of newNode is correct.             = ;
            //The next reference of the previous node of the target node node points to the new node newNode, and the insertion of the new node newNode is completed.            !.next = newNode;
            //The prev reference of the target node node points to the new node newNode,            //The previous node of the target node node in the linked list becomes the newly inserted node newNode.             = newNode;
            // Used to track the structural changes of the linked list, and add it every time the linked list is modified            //The value of version can detect concurrent modifications to the linked list during the iteration process.            version++;
            count++;
        }

InternalInsertNodeBefore() is used to implement the insertion of a new node in a linked list before a specified node, ensuring the correctness and consistency of the insertion operation and ensuring that the connection relationship and node count of the linked list are properly maintained. The above code has been logically explained. !.next = newNode; ensures that the previous node is not null when inserting a new node in the linked list to prevent potential null reference exceptions. The increase in version number is to provide a mechanism in concurrent operations so that the structural changes of the linked list can be detected during the iteration. This is a common practice for linked list operations in multithreaded environments to avoid potential concurrency problems.

Above we introduced the InternalInsertNodeToEmptyList() and InternalInsertNodeBefore() methods of LinkedList, which are used to insert elements into the linked list. Next, let’s take a look at the implementation logic of element query in the linked list. The method of implementing elements in LinkedList is Find().

public LinkedListNode&lt;T&gt;? Find(T value)
        {
            LinkedListNode&lt;T&gt;? node = head;
            EqualityComparer&lt;T&gt; c = EqualityComparer&lt;T&gt;.Default;
            if (node != null)
            {
                if (value != null)
                {
                     // Find non-null nodes                    do
                    {
                        if ((node!.item, value))
                        {
                            return node;
                        }
                        node = ;
                    } while (node != head);
                }
                else
                {
                    // Find the node with empty value                    do
                    {
                        if (node!.item == null)
                        {
                            return node;
                        }
                        node = ;
                    } while (node != head);
                }
            }
            // No node found            return null;
        }

By looping through each node in the linked list, find the matching node and return it according to the comparison of the node's value with the target value. There may be nodes containing null values in the linked list, or nodes containing non-null values, and these two situations require different comparison methods. LinkedListNode? node = head; Initialize a node reference node, pointing to the head of the linked list at the beginning. The do-while loop is used to ensure that it is executed at least once, even if the linked list is empty. To prevent potential null reference exceptions, the ! operator is used to assert that the node node is not null. The time complexity of the Find() method for querying values in the linked list is O(n).

The above introduces the query implementation logic of linked list elements. Next, let’s take a look at the removal operation of linked list elements, which is implemented in the InternalRemoveNode() method.

internal void InternalRemoveNode(LinkedListNode&lt;T&gt; node)
        {
            if ( == node)
            {
                //Set the header of the linked list to null, which means that the linked list is empty.                head = null;
            }
            else
            {
                //Refer the prev of the node after the target node node to the previous node of the target node.                !.prev = ;
                //Refer the next reference of the previous node of the target node node to the next node of the target node node.                !.next = ;
                if (head == node)
                {
                    //If the target node node is the header node of the linked list, set the header header to the next node of the target node node.                    head = ;
                }
            }
            ();
            count--;
            version++;
        }

Delete the specified node node in the bidirectional linked list, and first determine whether there is only one node in the linked list. If there is only one node in the linked list, then the linked list will be empty after deleting this node. Call the Invalidate method to clear the node's list, prev, and next references to get the node out of the linked list. version++ adds the version number of the linked list to detect changes in the linked list structure during concurrent iteration.

This section mainly introduces the operations such as element insertion, element query, and element removal of linked lists. In different scenarios, the implementation methods are different. The linked list structure maintained in C# is relatively simplified and there is no strong optimization within it. Therefore, when we apply linked lists in actual projects, we need to fully analyze the scenario appeals used for adjustment and optimization.

4. Comparison of implementation of linked lists and arrays in Queue

In the entire .NET Core data structure system, arrays occupy most of the application scenarios, and there are relatively few application scenarios for linked lists, but linked lists also have their own unique structure and are suitable for corresponding scenarios. In fact, in the .NET Framework version, the underlying implementation of Queue does use linked lists, while Stack implementation usually uses dynamic arrays. In the current .NET Core version, the underlying implementation of Queue has been modified to be based on Array arrays. Each has its advantages and disadvantages for choosing the underlying implementation solution of linked lists or arrays. Let’s take advantage of the differences in the implementation of Queue by .NET to compare the advantages and disadvantages of the selection of linked lists and arrays.

1. Advantages and disadvantages of using linked lists

1. The benefits of using linked lists:

(1) Efficient insertion and deletion operations: Inserting and deleting operations at the end of the team and at the head of the team is more efficient and conforms to typical operations of the queue.

(2) No continuous memory is required: linked lists do not require elements to be stored continuously in memory, which allows queues to allocate and free memory more flexibly.

(3) Suitable for frequent entrant and dequeue operations: the linked list performs better when dynamic growth and reduction, and is suitable for scenarios where frequent entrant and dequeue operations in the queue.

2. Disadvantages of using linked lists:

(1) The memory overhead is large: each node needs additional memory space to store references to the next node, which may result in relatively large memory overhead.

(2) Poor random access performance: linked lists do not support random access directly through indexes.

2. Advantages and disadvantages of using arrays in Queue

1. Advantages of using arrays:

(1) Random access performance: The array provides random access to O(1) time complexity, and the linked list needs to be traversed to the target position in order.

(2) Cache-friendly: Arrays are stored continuously in memory, and the storage of linked list nodes is scattered.

(3) Space efficiency: The array does not require additional references to the next node, and has smaller memory overhead.

(4) Applicable to specific access modes: For random access rather than insert/delete operations, it may be more appropriate to select an array as the underlying implementation.

2. Disadvantages of using arrays:

(1) Poor performance for inserting and deleting: Arrays have poor performance for inserting or deleting elements in the middle because elements need to be moved to maintain the order of the array.

(2) Overhead of dynamic expansion: If the size of the queue changes dynamically, the overhead of re-allocating memory and copying elements may affect performance when the array is dynamically expanded.

(3) Management of large queues: For large queues, if frequent dynamic expansion is required, they may face the challenge of memory management.

(4) Not applicable to specific insertion modes: If the main operation is frequent insertion and deletion rather than random access, selecting an array as the underlying implementation may not be the best choice.

5. Scenario application

The article begins with the basic characteristics of linked lists. Based on the basic characteristics of linked lists, we analyze the LinkedList structure of C#, and highlights the elements of LinkedList insertion, query, removal and storage objects. Linked lists are widely used in actual applications, especially in cache processing. Caching is a technology that improves data reading performance. It has a wide range of applications in hardware design and software development, such as common CPU cache, database cache, browser cache, etc. The size of the cache is limited. When the cache is fully used, which data should be cleaned up and which data should be retained? This requires a cache elimination strategy to decide. There are three common strategies: First In, FirstOut, Least Frequently Used, and Least Recently Used.

Here we will explain the implementation logic of LRU cache in a simple implementation way.

1. If this data has been cached in the linked list before, it will traverse the node corresponding to the data, delete it from its original position, and then insert it into the head of the linked list.

2. If this data is not in the cached linked list, it is divided into two situations:

(1) If the cache is not full at this time, insert this node directly into the head of the linked list;

(2) If the cache is full at this time, the node at the end of the linked list will be deleted and the new data node will be inserted into the head of the linked list.

For basic application scenarios of linked lists, such as: single linked list inversion; detection of loops in linked lists; orderly linked list merging, etc.

This is the article about in-depth analysis of the storage structure of LinkedList<T> in C#. For more related content of the storage structure of LinkedList<T> please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!