Performance for Java Stream.concat VS Collection.addAll

Question

For the purpose of combining two sets of data in a stream.

Stream.concat(stream1, stream2).collect(Collectors.toSet());

Or

stream1.collect(Collectors.toSet())
       .addAll(stream2.collect(Collectors.toSet()));

Which is more efficient and why?

The first one can theoretically better leverage parallelism. But unless you have millions of elements, who cares? — Patrick Parker, Jan 12 '17 at 20:41
In my testing the streams seem to be significantly faster. I just want to be sure there are not any gotchas with using the first method. — alan7678, Jan 12 '17 at 21:06
@Patrick Parker: unless the stream carry expensive intermediate operation before the code of the question, neither of them will gain anything from parallelism. The merging costs are on par with the potential savings of the parallel accumulations. In a sequential context, the first is obviously more efficient, as it doesn’t create an intermediate set that has to be passed to `addAll`. — Holger, Jan 13 '17 at 11:09
@Holger that may hold in general, but consider a very large dataset where nearly all the elements are duplicates. Then merging costs would then be trivial compared to the work of weeding out the duplicates. — Patrick Parker, Jan 13 '17 at 13:35
@Patrick Parker: That would require a significant amount of duplicates and a convenient distribution among the working chunks to yield a benefit. — Holger, Jan 13 '17 at 13:57

score 15 · Accepted Answer · answered Jan 12 '17 at 21:03

For the sake of readability and intention, Stream.concat(a, b).collect(toSet()) is way clearer than the second alternative.

For the sake of the question, which is "what is the most efficient", here a JMH test (I'd like to say that I don't use JMH that much, there might be some room to improve my benchmark test):

Using JMH, with the following code:

package stackoverflow;

import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.Stream;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

@State(Scope.Benchmark)
@Warmup(iterations = 2)
@Fork(1)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode({ Mode.AverageTime})
public class StreamBenchmark {
  private Set<String> s1;
  private Set<String> s2;

  @Setup
  public void setUp() {
    final Set<String> valuesForA = new HashSet<>();
    final Set<String> valuesForB = new HashSet<>();
    for (int i = 0; i < 1000; ++i) {
      valuesForA.add(Integer.toString(i));
      valuesForB.add(Integer.toString(1000 + i));
    }
    s1 = valuesForA;
    s2 = valuesForB;
  }

  @Benchmark
  public void stream_concat_then_collect_using_toSet(final Blackhole blackhole) {
    final Set<String> set = Stream.concat(s1.stream(), s2.stream()).collect(Collectors.toSet());
    blackhole.consume(set);
  }

  @Benchmark
  public void s1_collect_using_toSet_then_addAll_using_toSet(final Blackhole blackhole) {
    final Set<String> set = s1.stream().collect(Collectors.toSet());
    set.addAll(s2.stream().collect(Collectors.toSet()));
    blackhole.consume(set);
  }
}

You get these result (I omitted some part for readability).

Result "s1_collect_using_toSet_then_addAll_using_toSet":
  156969,172 ±(99.9%) 4463,129 ns/op [Average]
  (min, avg, max) = (152842,561, 156969,172, 161444,532), stdev = 2952,084
  CI (99.9%): [152506,043, 161432,301] (assumes normal distribution)

Result "stream_concat_then_collect_using_toSet":
  104254,566 ±(99.9%) 4318,123 ns/op [Average]
  (min, avg, max) = (102086,234, 104254,566, 111731,085), stdev = 2856,171
  CI (99.9%): [99936,443, 108572,689] (assumes normal distribution)
# Run complete. Total time: 00:00:25

Benchmark                                                       Mode  Cnt       Score      Error  Units
StreamBenchmark.s1_collect_using_toSet_then_addAll_using_toSet  avgt   10  156969,172 ± 4463,129  ns/op
StreamBenchmark.stream_concat_then_collect_using_toSet          avgt   10  104254,566 ± 4318,123  ns/op

The version using Stream.concat(a, b).collect(toSet()) should perform faster (if I read well the JMH numbers).

On the other hand, I think this result is normal because you don't create an intermediate set (this has some cost, even with HashSet), and as said in comment of first answer, the Stream is lazily concatenated.

Using a profiler you might see in which part it is slower. You might also want to use toCollection(() -> new HashSet(1000)) instead of toSet() to see if the problem lies in growing the HashSet internal hash array.

score 5 · Answer 2 · edited Jan 14 '17 at 22:59

Your question is known as premature optimization. Never choose one syntax over the other just because you think it is faster. Always use the syntax that best expresses your intent and supports understanding your logic.

You know nothing about the task i am working on – alan7678

Thats true.

But I don't need to.

There are two general scenarios:

You develop an OLTP application. In this case the application should respond within a second or less. The user will not experience the performance difference between the variants you presented.
You develop some kind of batch processing which will run for a while unattended. In this case the performance difference "could" be important, but only if you are charged for the time your batch process runs.

Either way: Real performance problems (where you speed up you application by multiples, not by fractions) are usually caused by the logic you implemented (e.g.: excessive communication, "hidden loops" or excessive object creation).
These problems usually cannot be solved or prevented by choosing a certain syntax.

If you omit readability for a performance gain you make you application harder to maintain.
And changing a hard to maintain code base easily burns a multiple amount of the money that could be saved because of the programs higher speed during the lifetime of the application by using a less readable but slightly faster syntax.

and without a doubt this question will matter in some cases for other people as well. – alan7678

No doubt, people are curious.

Luckily for me syntax i prefer seems to perform better as well. – alan7678

If you know, why did you ask?

And would you please be so kind to share you measurement results along with your measuring setup?

And more important: will that be valid with Java9 or Java10?

Javas performance comes basically from the JVM implementation and this is subject to change. Of cause there is a better chance for newer syntax constructs (as java streams) that new java versions will bring performance gains. But there is no guarantee...

In my case the need for performance is greater than the difference in readibility. – alan7678

Will you still be responsible for this application in 5 years? Or are you a consultant being payed to start of a project and then switching to the next?

I never had a project where I could solve my performance problems at the syntax level.
But I constantly work with legacy code that exist 10+ years and that is hard to maintain because someone did not honor readability.

So your non-answer does not apply to me. – alan7678

It's a free world, take you pick.

You know nothing about the task i am working on and without a doubt this question will matter in some cases for other people as well. Luckily for me syntax i prefer seems to perform better as well. In my case the need for performance is greater than the difference in readibility. So your non-answer does not apply to me. — alan7678, Jan 12 '17 at 21:15
I'm optimizing a flink algorithm and stumbled upon this issue, where this is very much relevant. — machete, Nov 30 '17 at 19:38

score 5 · Answer 3 · edited May 23 '17 at 11:54

First of all, it must be emphasized that the second variant is incorrect. The toSet() collector returns a Set with “no guarantees on the type, mutability, serializability, or thread-safety”. If mutability is not guaranteed, it is not correct to invoke addAll on the resulting Set.

It happens to work with the current version of the reference implementation, where a HashSet will be created, but might stop working in a future version or alternative implementations. In order to fix this, you have to replace toSet() with toCollection(HashSet::new) for the first Stream’s collect operation.

This leads to the situation that the second variant is not only less efficient with the current implementation, as shown in this answer, it might also prevent future optimizations made to the toSet() collector, by insisting on the result being of the exact type HashSet. Also, unlike the toSet() collector, the toCollection(…) collector has no way of detecting that the target collection is unordered, which might have a performance relevance in future implementations.

score 2 · Answer 4 · answered Jan 12 '17 at 20:33

2

Use either.

If you profile your app and this section of code is a bottleneck, then consider profiling your app with different implementations and using the one that works best

answered Jan 12 '17 at 20:33

Robert J Saulnier

51
4

Serg M Ten · Answer 5 · 2017-01-13T13:46:15.590

It is impossible to tell up front without a benchmark, but think about it: if there are many duplicates then Stream.concat(stream1, stream2) must create a large object that must be instantiated because you are callig .collect().

Then .toSet() must compare each occurence against every previous one, probably with a fast hashing function, but still might have a lot of elements.

On the other side, stream1.collect(Collectors.toSet()) .addAll(stream2.collect(Collectors.toSet())) will create two smaller sets and then merge them.

The memory footprint of this second option is potentially less than the first one.

Edit:

I revisited this after reading @NoDataFound benchmark. On a more sophisticated version of the test, indeed Stream.concat seems to perform consistentlly faster that Collection.addAll. I tried to take into account how many distinct elements are there and how big are the initial streams. I also took out of the measure the time required to create the input streams from sets (which is negligible anyway). Here is a sample of the times I get with the code below.

Concat-collect   10000 elements, all distinct: 7205462 nanos
Collect-addAll   10000 elements, all distinct: 12130107 nanos

Concat-collect  100000 elements, all distinct: 78184055 nanos
Collect-addAll  100000 elements, all distinct: 115191392 nanos

Concat-collect 1000000 elements, all distinct: 555265307 nanos
Collect-addAll 1000000 elements, all distinct: 1370210449 nanos

Concat-collect 5000000 elements, all distinct: 9905958478 nanos
Collect-addAll 5000000 elements, all distinct: 27658964935 nanos

Concat-collect   10000 elements, 50% distinct: 3242675 nanos
Collect-addAll   10000 elements, 50% distinct: 5088973 nanos

Concat-collect  100000 elements, 50% distinct: 389537724 nanos
Collect-addAll  100000 elements, 50% distinct: 48777589 nanos

Concat-collect 1000000 elements, 50% distinct: 427842288 nanos
Collect-addAll 1000000 elements, 50% distinct: 1009179744 nanos

Concat-collect 5000000 elements, 50% distinct: 3317183292 nanos
Collect-addAll 5000000 elements, 50% distinct: 4306235069 nanos

Concat-collect   10000 elements, 10% distinct: 2310440 nanos
Collect-addAll   10000 elements, 10% distinct: 2915999 nanos

Concat-collect  100000 elements, 10% distinct: 68601002 nanos
Collect-addAll  100000 elements, 10% distinct: 40163898 nanos

Concat-collect 1000000 elements, 10% distinct: 315481571 nanos
Collect-addAll 1000000 elements, 10% distinct: 494875870 nanos

Concat-collect 5000000 elements, 10% distinct: 1766480800 nanos
Collect-addAll 5000000 elements, 10% distinct: 2721430964 nanos

Concat-collect   10000 elements,  1% distinct: 2097922 nanos
Collect-addAll   10000 elements,  1% distinct: 2086072 nanos

Concat-collect  100000 elements,  1% distinct: 32300739 nanos
Collect-addAll  100000 elements,  1% distinct: 32773570 nanos

Concat-collect 1000000 elements,  1% distinct: 382380451 nanos
Collect-addAll 1000000 elements,  1% distinct: 514534562 nanos

Concat-collect 5000000 elements,  1% distinct: 2468393302 nanos
Collect-addAll 5000000 elements,  1% distinct: 6619280189 nanos

Code

import java.util.HashSet;
import java.util.Random;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class StreamBenchmark {
    private Set<String> s1;
    private Set<String> s2;

    private long createStreamsTime;
    private long concatCollectTime;
    private long collectAddAllTime;

    public void setUp(final int howMany, final int distinct) {
        final Set<String> valuesForA = new HashSet<>(howMany);
        final Set<String> valuesForB = new HashSet<>(howMany);
        if (-1 == distinct) {
            for (int i = 0; i < howMany; ++i) {
                valuesForA.add(Integer.toString(i));
                valuesForB.add(Integer.toString(howMany + i));
            }
        } else {
            Random r = new Random();
            for (int i = 0; i < howMany; ++i) {
                int j = r.nextInt(distinct);
                valuesForA.add(Integer.toString(i));
                valuesForB.add(Integer.toString(distinct + j));
            }
        }
        s1 = valuesForA;
        s2 = valuesForB;
    }

    public void run(final int streamLength, final int distinctElements, final int times, boolean discard) {
        long startTime;
        setUp(streamLength, distinctElements);
        createStreamsTime = 0l;
        concatCollectTime = 0l;
        collectAddAllTime = 0l;
        for (int r = 0; r < times; r++) {
            startTime = System.nanoTime();
            Stream<String> st1 = s1.stream();
            Stream<String> st2 = s2.stream();
            createStreamsTime += System.nanoTime() - startTime;
            startTime = System.nanoTime();
            Set<String> set1 = Stream.concat(st1, st2).collect(Collectors.toSet());
            concatCollectTime += System.nanoTime() - startTime;
            st1 = s1.stream();
            st2 = s2.stream();
            startTime = System.nanoTime();
            Set<String> set2 = st1.collect(Collectors.toSet());
            set2.addAll(st2.collect(Collectors.toSet()));
            collectAddAllTime += System.nanoTime() - startTime;
        }
        if (!discard) {
            // System.out.println("Create streams "+streamLength+" elements,
            // "+distinctElements+" distinct: "+createStreamsTime+" nanos");
            System.out.println("Concat-collect " + streamLength + " elements, " + (distinctElements == -1 ? "all" : String.valueOf(100 * distinctElements / streamLength) + "%") + " distinct: " + concatCollectTime + " nanos");
            System.out.println("Collect-addAll " + streamLength + " elements, " + (distinctElements == -1 ? "all" : String.valueOf(100 * distinctElements / streamLength) + "%") + " distinct: " + collectAddAllTime + " nanos");
            System.out.println("");
        }
    }

    public static void main(String args[]) {
        StreamBenchmark test = new StreamBenchmark();
        final int times = 5;
        test.run(100000, -1, 1, true);
        test.run(10000, -1, times, false);
        test.run(100000, -1, times, false);
        test.run(1000000, -1, times, false);
        test.run(5000000, -1, times, false);
        test.run(10000, 5000, times, false);
        test.run(100000, 50000, times, false);
        test.run(1000000, 500000, times, false);
        test.run(5000000, 2500000, times, false);
        test.run(10000, 1000, times, false);
        test.run(100000, 10000, times, false);
        test.run(1000000, 100000, times, false);
        test.run(5000000, 500000, times, false);
        test.run(10000, 100, times, false);
        test.run(100000, 1000, times, false);
        test.run(1000000, 10000, times, false);
        test.run(5000000, 50000, times, false);
    }
}

Because of your first half sentence I do not downvote your answer, but basically it is the same sort of *premature optimization* the OP does... — Timothy Truckle, Jan 12 '17 at 20:44
According to javadoc, `Stream.concat(stream1, stream2)` is *lazily concatenated*. — NoDataFound, Jan 12 '17 at 20:52
What do you mean with “`Stream.concat(stream1, stream2)` must create a large object”? Both operations produce the same resulting `Set` and besides that, there is no “large object” involved in `Stream.concat`. — Holger, Jan 13 '17 at 13:53

Ashwath · Answer 6 · 2020-05-26T07:52:48.467

I had been in a situation to decide whether or not to use Stream.of() with flatMap() or Stream.concat() or Collections.addAll() or Collections.add() to merge multiple list into single list. I did quick harness test on my code with 10 iterations and I obtained some surprising results.

------------------------------------------------------------------
1. Using getByAddAll()

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.414 ± 0.304  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.291 ± 0.332  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.571 ± 0.622  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.520 ± 0.818  ms/op

Average = 4.449ms
------------------------------------------------------------------

------------------------------------------------------------------
2. Using getByAdd()

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.280 ± 0.499  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.494 ± 0.374  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.575 ± 0.539  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.552 ± 0.272  ms/op

Average = 4.475ms
------------------------------------------------------------------


------------------------------------------------------------------
3. using getByStreamOf()
Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.502 ± 0.529  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.494 ± 0.754  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.676 ± 0.347  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.511 ± 0.950  ms/op

Average = 4.545ms
------------------------------------------------------------------


------------------------------------------------------------------
4. Using getByStreamConcat()

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.342 ± 0.372  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.218 ± 0.400  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.892 ± 0.562  ms/op

Benchmark                         Mode  Cnt  Score   Error  Units
PerformaceTest.test               avgt   10  4.818 ± 0.608  ms/op

Average = 4.567ms
------------------------------------------------------------------

This is my code

private List<ItemDTO> getByStreamOf(OfferResponseDTO catalogOfferDTO){
    return Stream.of(
            catalogOfferDTO.getCharges()
                    .stream()
                    .map(chargeWithPricePlanResponseDTO -> new ItemDTO(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())),

            catalogOfferDTO.getUsages()
                    .stream()
                    .map(usageResponseDTO -> new ItemDTO(usageResponseDTO.getDescription(), catalogOfferDTO.getDisplayOrder())),

            catalogOfferDTO.getNetworkElements()
                    .stream()
                    .map(networkElementResponseDTO -> new ItemDTO(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())),

            catalogOfferDTO.getEquipment()
                    .stream()
                    .map(equipmentResponseDTO -> new ItemDTO(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder())))

            .flatMap(Function.identity())
            .collect(Collectors.toList());
}


private List<ItemDTO> getByStreamConcat(OfferResponseDTO catalogOfferDTO){
    return Stream.concat(
            Stream.concat(
                    catalogOfferDTO.getCharges()
                            .stream()
                            .map(chargeWithPricePlanResponseDTO -> new ItemDTO(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
                    ,

                    catalogOfferDTO.getUsages()
                            .stream()
                            .map(usageResponseDTO -> new ItemDTO(usageResponseDTO.getDescription(),catalogOfferDTO.getDisplayOrder()))
            ),
            Stream.concat(
                    catalogOfferDTO.getEquipment()
                            .stream()
                            .map(equipmentResponseDTO -> new ItemDTO(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder())),

                    catalogOfferDTO.getNetworkElements()
                            .stream()
                            .map(networkElementResponseDTO -> new ItemDTO(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
            )
    )
            .collect(Collectors.toList());
}


private List<ItemDTO> getByAddAll(OfferResponseDTO catalogOfferDTO){
    List<ItemDTO> items = new ArrayList<>();

    items.addAll(catalogOfferDTO.getCharges()
            .stream()
            .map(chargeWithPricePlanResponseDTO -> new ItemDTO(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
            .collect(Collectors.toList()));

    items.addAll(catalogOfferDTO.getUsages()
            .stream()
            .map(usageResponseDTO -> new ItemDTO(usageResponseDTO.getDescription(), catalogOfferDTO.getDisplayOrder()))
            .collect(Collectors.toList()));

    items.addAll(catalogOfferDTO.getNetworkElements()
            .stream()
            .map(networkElementResponseDTO -> new ItemDTO(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
            .collect(Collectors.toList()));

    items.addAll(catalogOfferDTO.getEquipment()
            .stream()
            .map(equipmentResponseDTO -> new ItemDTO(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder()))
            .collect(Collectors.toList()));
    return items;
}

private List<ItemDTO> getByAdd(OfferResponseDTO catalogOfferDTO){
    List<ItemDTO> items = new ArrayList<>();

    catalogOfferDTO.getCharges()
            .stream()
            .map(chargeWithPricePlanResponseDTO -> items.add(this.addItem(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())));

    catalogOfferDTO.getUsages()
            .stream()
            .map(usageResponseDTO -> items.add(this.addItem(usageResponseDTO.getDescription(), catalogOfferDTO.getDisplayOrder())));

    catalogOfferDTO.getEquipment()
            .stream()
            .map(equipmentResponseDTO -> items.add(this.addItem(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder())));

    catalogOfferDTO.getNetworkElements()
            .stream()
            .map(networkElementResponseDTO -> items.add(this.addItem(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())));

    return items;
}

Performance for Java Stream.concat VS Collection.addAll

6 Answers6

Linked