SoFunction
Updated on 2025-04-14

Various common methods and examples of Java Stream deduplication

There are many ways to implement deduplication in Java Stream, depending on the requirements and scenarios. Here are some common methods and examples:

1. Use the distinct() method

Applicable to the object has been correctly implementedequals()andhashCode(), based on the overall object deduplication and retain order:

List<Person> uniquePersons = ()
                                    .distinct()
                                    .collect(());

2. Deduplicate according to the attributes of the object

Method 1: Use

According to the attribute as key, the first or last element is retained, and the order is supported (usingLinkedHashMap):

// Keep the first element that appearsList&lt;Person&gt; uniqueByName = ()
    .collect((
        Person::getName,
        (),
        (oldP, newP) -&gt; oldP, // Keep the old value (first)        LinkedHashMap::new    // Keep insertion order    ))
    .values().stream()
    .collect(());

// Keep the last element that appearsList&lt;Person&gt; uniqueByNameLast = ()
    .collect((
        Person::getName,
        (),
        (oldP, newP) -&gt; newP  // Keep new value (last one)    ))
    .values().stream()
    .collect(());

Method 2: Use filter and thread-safe Set

Suitable for parallel streams, but may not preserve order:

// Parallel stream deduplication (no order guaranteed)Set&lt;String&gt; seen = ();
List&lt;Person&gt; uniqueByName = ()
    .filter(p -&gt; (()))
    .collect(());

// Sequential stream deduplication (reserve order)Set&lt;String&gt; seenOrdered = new HashSet&lt;&gt;();
List&lt;Person&gt; uniqueByNameOrdered = ()
    .filter(p -&gt; (()))
    .collect(());

Method 3: Use groupingBy

After grouping, take the first element of each group, keeping the order:

List&lt;Person&gt; uniqueByName = ()
    .collect((
        Person::getName,
        LinkedHashMap::new,    // Keep insertion order        ()
    ))
    .values().stream()
    .map(group -&gt; (0)) // Take the first element    .collect(());

3. Deduplication example based on string length

List&lt;String&gt; words = ("apple", "banana", "orange", "grape", "kiwi");
List&lt;String&gt; uniqueByLength = ()
    .collect((
        String::length,
        (),
        (oldVal, newVal) -&gt; oldVal,
        LinkedHashMap::new
    ))
    .values().stream()
    .collect(());
// Result: ["apple", "banana", "kiwi"] (reserved order)

4. Custom deduplication is implemented with the help of Filter:

Customize a Predicate function, use a Set to record the elements that have appeared, and then filter it outRepeatedelement.

//Define a Predicate functionprivate static &lt;T&gt; Predicate&lt;T&gt; distinctByKey(Function&lt;? super T, ?&gt; keyExtractor) {
    Set&lt;Object&gt; sets = ();
    return t -&gt; ((t));
}

//Double the repetition according to the age attribute().filter(distinctByKey(s -&gt; ()))
        .forEach(::println);

Attachment: Use stream in Java to deduplicate based on a field in the object

In development, you often encounter data deduplication. It is easier to deduplicate a single basic type of collection, such as String and Integer. You can directly use the distinct method in the stream to deduplicate. However, when encountering an object collection, you need to use a certain field in the object to deduplicate it, so you cannot use this method. TreeSet can be added in streaming programming, which is an ordered and non-repetitive ordered set. Use user user data bits:

List<User> list = ();
        ().collect(((() -> new TreeSet<>(( f -> ()+":"+()))),ArrayList::new));

Here is the data deduplication based on the user's department and the user's status, and the user's department and status are spliced ​​into a string for deduplication. In this way, the entire user collection data will be assembled in this way for overall deduplication. The data deduplication is that there will only be one state department personnel data in each department. Here is an example. In the actual environment, you can replace the User with the entity collection you want to deduplicate.

Summarize

  • distinct(): Simple and efficient, suitable for overall object deduplication.
  • toMaporgroupingBy: Flexible, supports deduplication by attribute, and can control the retention order.
  • filter + Set: Suitable for parallel streams, but attention should be paid to thread safety and order issues.

Choose the most appropriate method according to the specific scenario to ensure that the code is concise and performs well.

This is the end of this article about various common methods and examples of Java Stream deduplication. For more related Java Stream deduplication content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!