SoFunction
Updated on 2025-03-07

Detailed explanation of the problem of deduplication of List collection using the built-in method of Distinct

Preface

When it comes to deduplication processing of sets, the first thing that comes to mind is Linq's Distinct extension method. It is easy to deal with general value type sets, so just (). However, if you want to deduplicate a collection of reference types (the attribute values ​​are the same, it is considered duplicate), you will find that directly Distinct() is not possible.

Let’s first look at the definition of the generic linked list List<T>:

public class List<T> : IList<T>, ICollection<T>, IList, ICollection, IReadOnlyList<T>, IReadOnlyCollection<T>, IEnumerable<T>, IEnumerable

It can be seen that it implements IEnumerable<T>, which specifies the Distinct method.

Pay attention to using this method:

(1) This method will not change the original linked list;

(2) This method returns an object (assuming it is called dis), through which the non-repetitive elements in the original linked list can be enumerated, but the non-repetitive elements are not copied into the new object (not even the signature copy)

(3) Since (2), when enumerating dis, it always depends on the original linked list, so if the original linked list is updated after obtaining dis, then using the dis enumeration will use the latest status of the original linked list.

 var list=new List&lt;SampleVersionDto&gt;()///Shows that there is a duplicate worthy set

Sometimes when Distinct() cannot deduplicate the reference type, we need to customize the custom code as follows:

public class User
{
 public int Id { get; set; }
 public string Name { get; set; }
}

var list = new List&lt;User&gt;() 
{ 
 new User() { Id = 1, Name = "Zhang San" } ,
 new User() { Id = 1, Name = "Zhang San" } ,
 new User() { Id = 3, Name = "Li Si" } ,
};

var newList1 = ().ToList();

Running the above code will reveal that it is not the expected result, and newList1 still has 3 elements. This result is because Distinct() returns non-repetitive elements in the sequence by comparing values ​​using the default equality comparator. For value types, the default equality comparator is to compare whether the values ​​are equal. For reference types, the default equality comparator is the reference address of the comparison object. Therefore, even if the attribute values ​​are the same in the above example, it cannot be deduplicated.

IEqualityComparer<TSource>

We are smart, and it is easy to find that Linq has reloaded a deduplication method for us to meet our needs:

public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer);

This method overloaded provides an additional parameter IEqualityComparer<TSource> comparer, which is a generic interface. We only need to implement this interface to meet our deduplication requirements:

public class UserComparer : IEqualityComparer<User>
{
 public bool Equals(User x, User y)
 {
 return  ==  &&  == ;
 }

 public int GetHashCode(User obj)
 {
 return ().GetHashCode();
 }
}

IEqualityComparer<TSource> defines two methods, one is Equals and the other is GetHashCode. Here I searched for reference materials and found that when comparing, the two elements are compared by GetHashCode by default. If the HashCode is different, the two elements are considered different. If the same, then the Equals method is compared. So here I cannot directly process the User object GetHashCode, but convert it into a string and then GetHashCode. Through this overload method, we can achieve our goal:

ar newList2 = (new UserComparer()).ToList();

We can even realize the effect of deeming duplicate as long as a certain attribute is the same. We only need to process it in the Equals method according to the desired comparison method.

Extended thinking

Distinct's overloading method can basically meet our various deduplication needs, but after thinking about it, it still feels a bit awkward. That is, if there are similar deduplication needs, we will add a new class to implement the IEqualityComparer<TSource> interface, which is not flexible enough. Based on the principle of encapsulation and reuse, I thought about whether we can optimize in this regard. I happened to be working on an Android project recently. I learned about Java and learned that Java has a syntax feature of an anonymous implementation interface. If C# can also implement interfaces anonymously, there is no need to add so many classes to implement interfaces, which will be much more convenient. Unfortunately, this feature is not available in C#. After reading the information, I feel that java is not actually an anonymous implementation in the true sense. It is a compiler that has tampered with it. When compiling, a real class is generated to implement the interface. After searching for information, I finally found a good solution:

public class LambdaComparer<T> : IEqualityComparer<T>
{
 private readonly Func<T, T, bool> _lambdaComparer;
 private readonly Func<T, int> _lambdaHash;
 public LambdaComparer(Func<T, T, bool> lambdaComparer)
 : this(lambdaComparer, EqualityComparer<T>.)
 {
 }
 public LambdaComparer(Func<T, T, bool> lambdaComparer, Func<T, int> lambdaHash)
 {
 if (lambdaComparer == null)
  throw new ArgumentNullException("lambdaComparer");
 if (lambdaHash == null)
  throw new ArgumentNullException("lambdaHash");
  _lambdaComparer = lambdaComparer;
  _lambdaHash = lambdaHash;
 }

 public bool Equals(T x, T y)
 {
 return _lambdaComparer(x, y);
 }

 public int GetHashCode(T obj)
 {
 return _lambdaHash(obj);
 }
}

It cleverly adopts the generic delegate method. The implementation only requires defining a class to implement the IEqualityComparer<TSource> interface, the implementation of Equals and GetHashCode, which is determined by the incoming delegate method, and the next step is simple.

var newList3 = (new LambdaComparer<User>((a, b) =>  ==  &&  == , obj => ().GetHashCode())).ToList();

Is it a very familiar writing method? Compare it as you want, which is convenient and fast. There is no need to define so many classes to implement interfaces, and the purpose is achieved. There are many extension methods in Linq, which will use the IEqualityComparer<TSource> interface. In this way, the reuse rate can be greatly improved

References

1、https:///article/

2、/c-Sharp/post_1277383

Summarize

The above is the entire content of this article. I hope that the content of this article has certain reference value for your study or work. Thank you for your support.