It is impossible to tell up front without a benchmark, but think about it: if there are many duplicates then Stream.concat(stream1, stream2)
must create a large object that must be instantiated because you are callig .collect()
.
Then .toSet()
must compare each occurence against every previous one, probably with a fast hashing function, but still might have a lot of elements.
On the other side, stream1.collect(Collectors.toSet()) .addAll(stream2.collect(Collectors.toSet()))
will create two smaller sets and then merge them.
The memory footprint of this second option is potentially less than the first one.
Edit:
I revisited this after reading @NoDataFound benchmark. On a more sophisticated version of the test, indeed Stream.concat seems to perform consistentlly faster that Collection.addAll. I tried to take into account how many distinct elements are there and how big are the initial streams. I also took out of the measure the time required to create the input streams from sets (which is negligible anyway). Here is a sample of the times I get with the code below.
Concat-collect 10000 elements, all distinct: 7205462 nanos
Collect-addAll 10000 elements, all distinct: 12130107 nanos
Concat-collect 100000 elements, all distinct: 78184055 nanos
Collect-addAll 100000 elements, all distinct: 115191392 nanos
Concat-collect 1000000 elements, all distinct: 555265307 nanos
Collect-addAll 1000000 elements, all distinct: 1370210449 nanos
Concat-collect 5000000 elements, all distinct: 9905958478 nanos
Collect-addAll 5000000 elements, all distinct: 27658964935 nanos
Concat-collect 10000 elements, 50% distinct: 3242675 nanos
Collect-addAll 10000 elements, 50% distinct: 5088973 nanos
Concat-collect 100000 elements, 50% distinct: 389537724 nanos
Collect-addAll 100000 elements, 50% distinct: 48777589 nanos
Concat-collect 1000000 elements, 50% distinct: 427842288 nanos
Collect-addAll 1000000 elements, 50% distinct: 1009179744 nanos
Concat-collect 5000000 elements, 50% distinct: 3317183292 nanos
Collect-addAll 5000000 elements, 50% distinct: 4306235069 nanos
Concat-collect 10000 elements, 10% distinct: 2310440 nanos
Collect-addAll 10000 elements, 10% distinct: 2915999 nanos
Concat-collect 100000 elements, 10% distinct: 68601002 nanos
Collect-addAll 100000 elements, 10% distinct: 40163898 nanos
Concat-collect 1000000 elements, 10% distinct: 315481571 nanos
Collect-addAll 1000000 elements, 10% distinct: 494875870 nanos
Concat-collect 5000000 elements, 10% distinct: 1766480800 nanos
Collect-addAll 5000000 elements, 10% distinct: 2721430964 nanos
Concat-collect 10000 elements, 1% distinct: 2097922 nanos
Collect-addAll 10000 elements, 1% distinct: 2086072 nanos
Concat-collect 100000 elements, 1% distinct: 32300739 nanos
Collect-addAll 100000 elements, 1% distinct: 32773570 nanos
Concat-collect 1000000 elements, 1% distinct: 382380451 nanos
Collect-addAll 1000000 elements, 1% distinct: 514534562 nanos
Concat-collect 5000000 elements, 1% distinct: 2468393302 nanos
Collect-addAll 5000000 elements, 1% distinct: 6619280189 nanos
Code
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class StreamBenchmark {
private Set<String> s1;
private Set<String> s2;
private long createStreamsTime;
private long concatCollectTime;
private long collectAddAllTime;
public void setUp(final int howMany, final int distinct) {
final Set<String> valuesForA = new HashSet<>(howMany);
final Set<String> valuesForB = new HashSet<>(howMany);
if (-1 == distinct) {
for (int i = 0; i < howMany; ++i) {
valuesForA.add(Integer.toString(i));
valuesForB.add(Integer.toString(howMany + i));
}
} else {
Random r = new Random();
for (int i = 0; i < howMany; ++i) {
int j = r.nextInt(distinct);
valuesForA.add(Integer.toString(i));
valuesForB.add(Integer.toString(distinct + j));
}
}
s1 = valuesForA;
s2 = valuesForB;
}
public void run(final int streamLength, final int distinctElements, final int times, boolean discard) {
long startTime;
setUp(streamLength, distinctElements);
createStreamsTime = 0l;
concatCollectTime = 0l;
collectAddAllTime = 0l;
for (int r = 0; r < times; r++) {
startTime = System.nanoTime();
Stream<String> st1 = s1.stream();
Stream<String> st2 = s2.stream();
createStreamsTime += System.nanoTime() - startTime;
startTime = System.nanoTime();
Set<String> set1 = Stream.concat(st1, st2).collect(Collectors.toSet());
concatCollectTime += System.nanoTime() - startTime;
st1 = s1.stream();
st2 = s2.stream();
startTime = System.nanoTime();
Set<String> set2 = st1.collect(Collectors.toSet());
set2.addAll(st2.collect(Collectors.toSet()));
collectAddAllTime += System.nanoTime() - startTime;
}
if (!discard) {
// System.out.println("Create streams "+streamLength+" elements,
// "+distinctElements+" distinct: "+createStreamsTime+" nanos");
System.out.println("Concat-collect " + streamLength + " elements, " + (distinctElements == -1 ? "all" : String.valueOf(100 * distinctElements / streamLength) + "%") + " distinct: " + concatCollectTime + " nanos");
System.out.println("Collect-addAll " + streamLength + " elements, " + (distinctElements == -1 ? "all" : String.valueOf(100 * distinctElements / streamLength) + "%") + " distinct: " + collectAddAllTime + " nanos");
System.out.println("");
}
}
public static void main(String args[]) {
StreamBenchmark test = new StreamBenchmark();
final int times = 5;
test.run(100000, -1, 1, true);
test.run(10000, -1, times, false);
test.run(100000, -1, times, false);
test.run(1000000, -1, times, false);
test.run(5000000, -1, times, false);
test.run(10000, 5000, times, false);
test.run(100000, 50000, times, false);
test.run(1000000, 500000, times, false);
test.run(5000000, 2500000, times, false);
test.run(10000, 1000, times, false);
test.run(100000, 10000, times, false);
test.run(1000000, 100000, times, false);
test.run(5000000, 500000, times, false);
test.run(10000, 100, times, false);
test.run(100000, 1000, times, false);
test.run(1000000, 10000, times, false);
test.run(5000000, 50000, times, false);
}
}