2

Stream.reduce has 3 method overloads.

reduce(BinaryOperator<T> accumulator)
reduce(T identity, BinaryOperator<T> accumulator)
reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner)
  • 1st overload can be used to calculate sum of integer list for example.
  • 2nd overload is the same but if the list is empty it just returns the default value.

I'm having a hard time understanding how third overload (Stream.reduce(identity, accumulator, combiner)) works and what is a use case of that. So, how does it work, and why does that exists?

s1n7ax
  • 2,750
  • 6
  • 24
  • 53
  • Hello. Could you include more details about *what* is confusing you? For instance is it some fragment of [documentation](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#reduce-U-java.util.function.BiFunction-java.util.function.BinaryOperator-) of that method, or maybe from tutorial about [reduction](https://docs.oracle.com/javase/tutorial/collections/streams/reduction.html)? – Pshemo Jan 20 '23 at 19:04
  • Also have you seen description of reduction in documentation https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#Reduction? – Pshemo Jan 20 '23 at 19:07
  • @Pshemo The example in the doc looks like same can be achieved using 2nd overload of the reduce, or is it? Reduction does not include an example of the 3rd overload I think. – s1n7ax Jan 20 '23 at 19:13
  • 2
    @s1n7ax, If you are transforming from Collection to T, you don' t need the third version. Or if you are using sequential stream, you don't need the third version. You only need the third version when you are transforming from Collection to U using parallel streams – May Rest in Peace Jan 20 '23 at 19:20
  • "How it works?" the specific *implementation* of that method depends on context so it is hard to say how it *works*, we can only describe main idea behind it, which is that in some cases it *may be* possible to improve performance by splitting data in stream into some portions, and letting more threads reduce them. The `accumulator` describes how each thread should do it. But when those threads are done (or at least some of them are) we need some way to *combine* those result. That is the job of `combiner`. – Pshemo Jan 20 '23 at 19:27
  • 1
    Does this answer your question? [Why is a combiner needed for reduce method that converts type in java 8](https://stackoverflow.com/questions/24308146/why-is-a-combiner-needed-for-reduce-method-that-converts-type-in-java-8) – Didier L Jan 20 '23 at 20:30
  • 1
    Also https://stackoverflow.com/q/30015971/525036 – Didier L Jan 20 '23 at 20:31

3 Answers3

1

If I understand correctly, your question is about the third argument combiner.

Firstly, one of the goals of Java was to have similar APIs for sequential and parallel streams. The 3-argument version of reduce is useful for parallel streams.

Suppose you are reducing from value of Collection<T> to U type and you are using parallel stream versions. The parallel stream splits the collection T into smaller streams and generates a u' value for each by using the second function. But now these different u' values have to be combined ? How do they get combined ? The third function is the one that provides that logic.

May Rest in Peace
  • 2,070
  • 2
  • 20
  • 34
  • Ok I see. So does that mean the copies of `identity` are created in each thread and using the `combiner` function to help combine all of them? – s1n7ax Jan 20 '23 at 19:28
  • 1
    @s1n7ax There is no need to create copy of `identity`, since it is never modified. Note that *reduction* only uses elements to create *new result*. – Pshemo Jan 20 '23 at 19:33
  • 1
    @s1n7ax Actually main difference between `reduce` and `collect` is that `accumulator` in reduce *produces* new object representing current result, which at each "iteration" is reassigned to local variable ***result***. It is expressed via `result = accumulator.apply(result, element)`. In case of *collecting* the accumulator *modifies initial identity* and never creates new object to assign it to ***result***. This is expressed via `accumulator.accept(result, element);` (there is no need for `result = ...` part). – Pshemo Jan 20 '23 at 19:47
1

Note: Some of the examples are contrived for demonstration. In some instances a simple .sum() could have been used.

The big difference, imo, is that the third form has a BiFunction as a second argument instead of a BinaryOperator. So you can use the third form to change the result type. It also has a BinaryOperator as a combiner to combine the different results from parallel operations.

Generate some data

record Data(String name, int value) {}

Random r = new Random();
List<Data> dataList = r.ints(1000, 1, 20).mapToObj(i->new Data("Item"+i, i)).toList();

No parallel operation but different types. But the third argument is required so just return the sum.

int sum = dataList.stream().reduce(0, (item, data) -> item + data.value,
        (finalSum, partialSum) -> finalSum);
System.out.println(sum);

prints

10162

The second form. Use map to get the value to be summed. BinaryOperator used here since types are the same and no parallel operation.

sum = dataList.stream().map(Data::value).reduce(0, (sum1,val)->sum1+val);
System.out.println(sum); // print same as above

This shows the same as above but in parallel. The third argument accumulates partial sums. And those sums are accumulated as the next thread finishes so there may not be a sensible order to the output.

sum = dataList.parallelStream().reduce(0, (sum1, data) -> sum1 + data.value,
        (finalSum, partialSum) -> {
           
            System.out.println("Adding " + partialSum + " to " + finalSum);
            finalSum += partialSum;
            return finalSum;
        });
System.out.println(sum);

prints something like the following

Adding 586 to 670
Adding 567 to 553
Adding 1256 to 1120
Adding 715 to 620
Adding 624 to 601
Adding 1335 to 1225
Adding 2560 to 2376
Adding 662 to 579
Adding 706 to 715
Adding 1421 to 1241
Adding 713 to 689
Adding 576 to 586
Adding 1402 to 1162
Adding 2662 to 2564
Adding 4936 to 5226
10162

One final note. None of the Collectors.reducing methods have a BiFunction to handle different types. To handle this the second argument is a Function to act as a mapper so the third argument, a BinaryOperator can collect the values.

sum = dataList.parallelStream().collect(
       Collectors.reducing(0, Data::value, (finalSum, partialSum) -> {
           System.out.println(
                   "Adding " + partialSum + " to " + finalSum);      
           return finalSum + partialSum;
       }));

System.out.println(sum);
WJS
  • 36,363
  • 4
  • 24
  • 39
1

Basically it combines a mapping function with a reduction. Most of the examples I've seen for this don't really demonstrate why it's preferrable to calling map() and a normal reduce() in separate steps. The API Note comes in handy here:

Many reductions using this form can be represented more simply by an explicit combination of map and reduce operations. The accumulator function acts as a fused mapper and accumulator, which can sometimes be more efficient than separate mapping and reduction, such as when knowing the previously reduced value allows you to avoid some computation.

So let's say we have a Stream<String> numbers, and we want to parse them to BigDecimal and calculate their product. We could do something like this:

BigDecimal product = numbers.map(BigDecimal::new)
        .reduce(BigDecimal.ONE, BigDecimal::multiply);

But this has an inefficiency. If one of the numbers is "0", we're wasting cycles converting the remainder to BigDecimal. We can use the 3-arg reduce() here to bypass the mapping logic:

BigDecimal product = numbers.reduce(BigDecimal.ONE,
        (d, n) -> d.equals(BigDecimal.ZERO) ? BigDecimal.ZERO : new BigDecimal(n).multiply(d),
        BigDecimal::multiply);

Of course it would be even more efficient to short-circuit the stream entirely, but that's tricky to do in a stream, especially in parallel. And this is just an example to get the concept across.

shmosel
  • 49,289
  • 6
  • 73
  • 138
  • 1
    Maybe, it’s better with an example of only potentially skipping conversion costs, e.g. `(d, n) -> d.equals("1")? n: new BigDecimal(n).multiply(d)`. This still is only useful in cases with significant likelihood of encountering `"1"` in the stream. I’m not sure whether I ever saw an actual real life example exploiting the advantages of this method. Even weirder is [the collector](https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/util/stream/Collectors.html#reducing(U,java.util.function.Function,java.util.function.BinaryOperator)) which has no optimization potential at all. – Holger Jan 23 '23 at 09:48
  • @Holger `d.equals("1")` could be implemented as a preemptive `filter`. Why do you say the collector has no optimization potential? – shmosel Jan 23 '23 at 09:56
  • 1
    Note that I have `n` and `d` swapped in my example. And yes, it could have been filtered instead. As said, it’s hard to come up with an actual useful example. Maybe, `(d, s) -> s.equals("2")? d.add(d): new BigDecimal(s).multiply(d)`. The collector has no optimization potential because, unlike the three-arg `reduce`, there is no fusion of the conversion and the reduction function; it receives exactly the same kind of `Function`, a preceding `map` (or `Collector.mapping`) step would get and it can’t be implemented differently than what `mapping(function, reducing(…))` does. – Holger Jan 23 '23 at 10:01