Ten ways to deduplicate C++ arrays

In C++, array deduplication is a common operation. Here are some common array deduplication methods:

1. Use std::sort and std::unique (STL method)

principle

first,std::sortFunctions are used to sort arrays. It arranges array elements in a specific order (default is ascending). For example, for an integer arrayint arr[] = {5, 3, 4, 3, 2, 5};，std::sortIt will be rearranged into an ordered sequence, such as{2, 3, 3, 4, 5, 5}。
Then,std::uniqueFunctions are used to remove adjacent duplicate elements. It achieves deduplication by moving non-repeating elements to the front part of the array. In the array sorted above,std::uniqueWill not repeat elements2, 3, 4, 5Move to the front of the array and return an iterator pointing to the end of the array after deduplication (actually a pointer, in the array scenario).

Sample code:

#include <iostream>
#include <algorithm>
int main() {
    int arr[] = {5, 3, 4, 3, 2, 5}; 
    int n = sizeof(arr)/sizeof(arr[0]); 
    std::sort(arr, arr + n); 
    int new_end = std::unique(arr, arr + n)-arr; 
    for (int i = 0; i < new_end; i++) {
        std::cout << arr[i] << " "; 
    }
    return 0; 
}

In this example, first calculate the size of the array, then sort the array, and then usestd::uniqueDeduplication, and finally iterate over the deduplication part of the array and output it.

Things to note

std::uniqueOnly adjacent duplicate elements can be removed, so the array is usually sorted before use.
It does not change the size of the array, but just moves the duplicate element to the end of the array, returning the end position of the effective element after deduplication.

2. Use the std::set container

principle

std::setIt is an associated container in C++ STL, and its characteristics are unique to elements. Whenstd::setWhen an element is inserted, it automatically checks whether the element already exists, if not, inserts, otherwise ignores.

Sample code:

#include <iostream>
#include <set>
int main() {
    int arr[] = {5, 3, 4, 3, 2, 5}; 
    int n = sizeof(arr)/sizeof(arr[0]); 
    std::set<int> s; 
    for (int i = 0; i < n; i++) {
        (arr[i]);  
    }
    for (auto it = ();  it!= ();  it++) {
        std::cout << *it << " "; 
    }
    return 0; 
}

Create a firststd::set, then iterate through the array to insert the element intostd::setIn the last traversalstd::setOutput the deduplication element.

Things to note

std::setThe elements in it are stored in a specific order (default is ascending order). If this order is not required, you can consider using it.std::unordered_set, it may be more efficient in inserting and finding operations.
usestd::setDeduplication changes the storage structure of the element, and if the deduplication result needs to be converted back to an array, additional operations are required.

3. Manual comparison method (dual cycle method)

principle

This method uses two-layer loops. The outer loop traverses each element of the array, and the inner loop is used to compare whether the current element is the same as the element that has been processed before. If the same element is found, it means that the current element is repeated and can be skipped or processed accordingly.

Sample code:

#include <iostream>
int main() {
    int arr[] = {5, 3, 4, 3, 2, 5}; 
    int n = sizeof(arr)/sizeof(arr[0]); 
    int new_size = 0; 
    for (int i = 0; i < n; i++) {
        bool is_duplicate = false; 
        for (int j = 0; j < new_size; j++) {
            if (arr[i] == arr[j]) {
                is_duplicate = true; 
                break; 
            }
        }
        if (!is_duplicate) {
            arr[new_size] = arr[i]; 
            new_size++; 
        }
    }
    for (int i = 0; i < new_size; i++) {
        std::cout << arr[i] << " "; 
    }
    return 0; 
}

In this example,new_sizeUsed to record the array size after deduplication. The outer loop takes one element at a time, and the inner loop checks whether the element has appeared in front. If not, put it into the deduplication part of the array.

Things to note

This method has a high time complexity, which is O(n^2)O(n2), wherenis the size of the array. When the array is large, the performance may be poor.
It doesn't require extra containers, but the code is relatively complex.

4. Use std::map to record the number of elements

principle

std::mapis an associated container in C++ STL that stores data in the form of key-value pairs. Here, you can use the array element as a key and the number of times the element appears as a value. When iterating through the array, if the element appears for the first time, insert it intostd::mapIn, the value is set to 1; if the element already exists, the corresponding value is added to 1. Finally, only the key with a value of 1 is traversed, which is the element after deduplication.

Sample code:

#include <iostream>
#include <map>
int main() {
    int arr[] = {5, 3, 4, 3, 2, 5}; 
    int n = sizeof(arr)/sizeof(arr[0]); 
    std::map<int, int> m; 
    for (int i = 0; i < n; i++) {
        if ((arr[i])!=  ())  { 
            m[arr[i]]++; 
        } else {
            m[arr[i]] = 1; 
        }
    }
    for (auto it = ();  it!= ();  it++) {
        if (it->second == 1) {
            std::cout << it->first << " "; 
        }
    }
    return 0; 
}

Create firststd::map, then iterate through the array to update the number of occurrences of the element, and finally traversestd::mapThe output element that appears only once.

Things to note

This method uses additionalstd::mapThe container stores the number of occurrences of elements and will occupy a certain amount of extra space.
Compared with the direct comparison method, although the time complexity may be better under the average situation, it may have certain performance overhead for some special situations.

5. Use std::unordered_set (hash table deduplication)

principle

std::unordered_setIt is an associated container based on a hash table implementation. Its average time complexity of insertion and search operations is close to the constant time O(1)O(1). Whenstd::unordered_setWhen an element is inserted, it will quickly determine whether the element already exists based on the element's hash value. If it does not exist, it will be inserted, thereby achieving deduplication.

Sample code:

#include <iostream>
#include <unordered_set>
int main() {
    int arr[] = {5, 3, 4, 3, 2, 5}; 
    int n = sizeof(arr)/sizeof(arr[0]); 
    std::unordered_set<int> s; 
    for (int i = 0; i < n; i++) {
        (arr[i]);  
    }
    for (auto it = ();  it!= ();  it++) {
        std::cout << *it << " "; 
    }
    return 0; 
}

With usestd::setSimilar, but here isstd::unordered_set, it may be more suitable for situations where there is no need to sort and the number of elements is large.

Things to note

std::unordered_setThe hash function may cause hash conflicts and may affect performance in extreme cases. But in most cases, it performs better.
Likewise, converting the deduplication result back to an array may require additional operations.

6. Marking method

principle

An additional Boolean array can be created to mark elements that have appeared. When iterating through the original array, for each element, check whether its corresponding position in the tag array has been marked. If it is not marked, it means that the element appears for the first time, add it to the result after deduplication, and mark the element in the mark array; if it has been marked, it means that it is a duplicate element, skip.

Sample code:

#include <iostream>
int main() {
    int arr[] = {5, 3, 4, 3, 2, 5}; 
    int n = sizeof(arr)/sizeof(arr[0]); 
    bool marked[n]; 
    for (int i = 0; i < n; i++) {
        marked[i] = false; 
    }
    int new_size = 0; 
    for (int i = 0; i < n; i++) {
        if (!marked[i]) {
            arr[new_size] = arr[i]; 
            new_size++; 
            for (int j = i + 1; j < n; j++) {
                if (arr[i] == arr[j]) { 
                    marked[j] = true; 
                }
            }
        } 
    }
    for (int i = 0; i < new_size; i++) {
        std::cout << arr[i] << " "; 
    }
    return 0; 
}

Here, first, the mark array is initialized, and then when it is traversing the original array, it determines whether the elements are duplicated based on the mark array, and updates the mark array and the deduplicated array.

Things to note

This method requires creating an additional Boolean array of the same size as the original array for marking, which will take up some extra space.
Its time complexity depends on the distribution of repeated elements in the original array, but in the worst case it is O(n^2)O(n2).

7. Comparison of single loops after sorting (optimization of double loop method)

principle

First sort the array so that the same elements will be adjacent. Then you just need to loop through the array once to compare whether the current element and the next element are the same. If it is different, it means that the current element is not repeated and corresponding processing can be performed; if it is the same, it means that it is a duplicate element and can be skipped.

Sample code:

#include <iostream>
#include <algorithm>
int main() {
    int arr[] = {5, 3, 4, 3, 2, 5}; 
    int n = sizeof(arr)/sizeof(arr[0]); 
    std::sort(arr, arr + n); 
    int new_size = 0; 
    for (int i = 0; i < n - 1; i++) {
        if (arr[i]!= arr[i + 1]) {
            arr[new_size] = arr[i]; 
            new_size++; 
        }
    }
    arr[new_size] = arr[n - 1]; 
    new_size++; 
    for (int i = 0; i < new_size; i++) {
        std::cout << arr[i] << " "; 
    }
    return 0; 
}

First sort the array, then compare adjacent elements in a loop, and finally process the last element and output the deduplicated array.

Things to note

Compared with the ordinary double loop method, this method takes advantage of the adjacent characteristics of the same elements after sorting, reducing the number of comparisons, and the time complexity is O(nlogn + n)O(nlogn+n), where nlognnlogn is the time complexity of sorting.nIt is the time complexity of traversing the array.
However, it requires sorting the array first, which may be an extra overhead if the sorted result is not required.

8. Utilize array pointers and dynamic memory allocation

principle

You can create a new array that dynamically allocates memory and then iterate through the original array through pointers. For each element in the original array, check if it already exists in the new array. If it does not exist, add it to the new array. This approach avoids some complex STL containers, but requires manual memory management.

Sample code:

#include <iostream>
int main() {
    int arr[] = {5, 3, 4, 3, 2, 5}; 
    int n = sizeof(arr)/sizeof(arr[0]); 
    int* new_arr = new int[n]; 
    int new_size = 0; 
    for (int i = 0; i < n; i++) {
        bool is_duplicate = false; 
        for (int j = 0; j < new_size; j++) {
            if (arr[i] == new_arr[j]) { 
                is_duplicate = true; 
                break; 
            }
        }
        if (!is_duplicate) {
            new_arr[new_size] = arr[i]; 
            new_size++; 
        }
    }
    for (int i = 0; i < new_size; i++) {
        std::cout << new_arr[i] << " "; 
    }
    delete[] new_arr; 
    return 0; 
}

Here a new array is dynamically allocated, and then the non-repetitive elements are put into the new array through a double loop comparison, and finally output and free memory.

Things to note

This method requires the management of dynamically allocated memory manually, and forgetting to free memory will lead to memory leaks.
The dual loop comparison results in a time complexity of O(n^2)O(n2), and the performance may be poor when the array is large.

9. Use std::vector and std::erase - std::remove combination (for std::vector type array)

principle

In C++,std::vectoris a dynamic array container.std::removeThe function does not actually delete the elements, but moves the unwanted elements to the end of the container and returns an iterator pointing to the end of the new logic. Thenstd::eraseThe function is based on this iterator.std::vectordelete elements in.

Sample code:

#include <iostream>
#include <vector>
#include <algorithm>
int main() {
    std::vector<int> v = {5, 3, 4, 3, 2, 5}; 
    (std::remove((),  (),  3), ());  
    (std::remove((),  (),  5), ());  
    for (auto it : v) {
        std::cout << it << " "; 
    }
    return 0; 
}

Use it first herestd::removeMove the specified element (here are 3 and 5) tostd::vectorat the end ofstd::eraseDelete these elements. It should be noted that if you want to deduplicate all elements, you may need to call them multiple times.std::removeandstd::eraseOr adopt more complex logic.

Things to note

This method is mainly aimed atstd::vectorIf it is an ordinary array, you need to convert it tostd::vector。
std::removeThe use may be more complicated, especially for deduplication of multiple elements or complete deduplication.

10. Custom hash function deduplication (for custom type arrays)

principle

When an element in an array is of a custom type, a hash function can be defined to calculate the hash value of an element. Then you can use something likestd::unordered_setThe principle of , create a container or data structure based on a custom hash function, and determine whether the elements are duplicated by comparing the hash values.

Sample code (assuming the custom type isMyType）:

#include &lt;iostream&gt;
#include &lt;unordered_set&gt;
struct MyType {
    int a; 
    int b; 
    // Assume that hash values are calculated based on a and b    size_t operator()(const MyType&amp; obj) const {
        return std::hash&lt;int&gt;()()^ std::hash&lt;int&gt;()(); 
    }
}; 
int main() {
    MyType arr[] = { {1, 2}, {3, 4}, {1, 2}, {5, 6} }; 
    int n = sizeof(arr)/sizeof(arr[0]); 
    std::unordered_set&lt;MyType, MyType&gt; s; 
    for (int i = 0; i &lt; n; i++) {
        (arr[i]);  
    }
    for (auto it = ();  it!= ();  it++) {
        std::cout &lt;&lt; "(" &lt;&lt; it-&gt;a &lt;&lt; ", " &lt;&lt; it-&gt;b &lt;&lt; ") "; 
    }
    return 0; 
}

It's defined hereMyTypeStructure, and a hash function is defined for it. Then usestd::unordered_setCombined with custom hash function pairsMyTypeDeduplication of type array.

This is the end of this article about ten methods of deduplication of C++ arrays. For more related content on deduplication of C++ arrays, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!