SoFunction
Updated on 2025-04-11

Detailed explanation of the use of Parallel Stream in Java

Java Parallel Stream

Parallel streaming is a tool introduced by Java 8 to efficiently process collection data, and accelerate calculations through multithreading.

The following is a detailed guide to its core concepts, usage methods and precautions:

1. Core concepts and principles

  • Parallel processing mechanism: Split the data into multiple blocks, usingFork/JoinThe framework processes it in parallel on multiple threads and finally merges the results.
  • Default thread pool:use(), the number of threads is equal to the number of CPU cores (can be adjusted through system parameters).
  • Applicable scenarios: Large-scale data sets, computing-intensive tasks (such as mathematical operations, batch conversion).

2. How to create parallel streams

  • Generate directly: Through the collectionparallelStream()method.
  • Conversion order stream: Called on an existing streamparallel()
List<Integer> list = (1, 2, 3, 4);

// Method 1: Directly generate parallel streamsStream<Integer> parallelStream1 = ();

// Method 2: Transfer sequentially to parallelStream<Integer> parallelStream2 = ().parallel();

3. Applicable scenarios and performance optimization

Recommended scenarios

  • Large amount of data: Such as filtering and mapping of million-level elements.
  • Complicated complexity: Such as matrix operation and image processing.
  • Stateless operation:likemapfilterreduce(Not relying on processing order or external variables).

Performance Trap

  • Small dataset: Parallelization overhead (thread scheduling, data segmentation) may offset the benefits.
  • Low time-consuming operation: If simple addition and subtraction, parallelism may be slower.

4. Precautions and best practices

Avoid sharing of variable states

Modifying shared variables in parallel operations will cause thread safety issues, and stateless operations or synchronization control should be used.

// Error example: thread unsafe accumulationList<Integer> nums = (1, 2, 3);
int[] sum = {0};
().forEach(n -> sum += n); // The result may be wrong
// Correct way: Use reductionint safeSum = ().reduce(0, Integer::sum);

Use stateful operations with caution

likesorted()distinct()It may be more time consuming in parallel streams and requires merging thread results.

// Parallel sorting (maybe slower than sequential flow)List<Integer> sortedList = ().sorted().toList();

Data source splitability

  • High-efficiency structureArrayList, array (supports fast random access and is easy to segment).
  • Inefficient structureLinkedListTreeSet(The splitting cost is high).

Sequentially sensitive operations

useforEachOrderedSequence is guaranteed, but performance is sacrificed.

// Output in sequence (performance is lower than unordered operation)().forEachOrdered(::println);

Configure thread pool

Default number of threads:

().availableProcessors()

Modify the global number of threads:

# JVM startup parameters-=8

5. Performance comparison example

// Sequential stream vs parallel stream (processing 10 million data)List<Long> numbers = (1, 10_000_000)
                               .boxed().collect(());

// Sequential flow timelong start = ();
long seqSum = ().mapToLong(n -> n * 2).sum();
("Sequential flow takes time: " + (() - start) + "ms");

// Parallel flow timestart = ();
long parSum = ().mapToLong(n -> n * 2).sum();
("Parallel flow takes time: " + (() - start) + "ms");

Typical results(8 core CPU):

Sequential streaming time: 120ms Parallel streaming time: 35ms

Summarize

Advantages: Simplify multi-threaded programming and improve the efficiency of big data processing.

Limited: Not suitable for small data volume, sequential sensitive or low computational volume tasks.

Best Practices

  • Prioritize large-scale data processing.
  • Avoid operating with shared variables.
  • Test verification performance improvement.
  • useforEachAlternativeforEachOrderedUnless order must be guaranteed.

By using parallel streams reasonably, program performance can be significantly improved without adding complex code, but the pros and cons need to be weighed in combination with scenario trade-offs.

The above is personal experience. I hope you can give you a reference and I hope you can support me more.