223

I'm having trouble fully understanding the role that the combiner fulfills in Streams reduce method.

For example, the following code doesn't compile:

int length = asList("str1", "str2").stream()
            .reduce(0, (accumulatedInt, str) -> accumulatedInt + str.length());

Compile error says : (argument mismatch; int cannot be converted to java.lang.String)

but this code does compile:

int length = asList("str1", "str2").stream()  
    .reduce(0, (accumulatedInt, str ) -> accumulatedInt + str.length(), 
                (accumulatedInt, accumulatedInt2) -> accumulatedInt + accumulatedInt2);

I understand that the combiner method is used in parallel streams - so in my example it is adding together two intermediate accumulated ints.

But I don't understand why the first example doesn't compile without the combiner or how the combiner is solving the conversion of string to int since it is just adding together two ints.

Can anyone shed light on this?

Null
  • 1,950
  • 9
  • 30
  • 33
Louise Miller
  • 3,069
  • 2
  • 12
  • 13
  • Related question: https://stackoverflow.com/questions/24202473/does-a-sequential-stream-in-java-8-use-the-combiner-parameter-on-calling-collect – nosid Jun 19 '14 at 14:42
  • 7
    aha, it's for parallel streams...I call leaky abstraction! – Andy Apr 22 '17 at 19:20
  • 2
    I ran into a similar problem. I wanted to do a map-reduce. I wanted Stream's "reduce" method to have an overloaded version that allows mapping to a different type than the input type, but does not force me to write a combiner. As far as I know, Java does not have such a method. Because some people, like me, expect to find it, but it is not there, this creates confusion. Note: I did not want to write a combiner because the output was a complex object for which a combiner was not realistic. – user2367418 May 17 '21 at 19:54

4 Answers4

329

Eran's answer described the differences between the two-arg and three-arg versions of reduce in that the former reduces Stream<T> to T whereas the latter reduces Stream<T> to U. However, it didn't actually explain the need for the additional combiner function when reducing Stream<T> to U.

One of the design principles of the Streams API is that the API shouldn't differ between sequential and parallel streams, or put another way, a particular API shouldn't prevent a stream from running correctly either sequentially or in parallel. If your lambdas have the right properties (associative, non-interfering, etc.) a stream run sequentially or in parallel should give the same results.

Let's first consider the two-arg version of reduction:

T reduce(I, (T, T) -> T)

The sequential implementation is straightforward. The identity value I is "accumulated" with the zeroth stream element to give a result. This result is accumulated with the first stream element to give another result, which in turn is accumulated with the second stream element, and so forth. After the last element is accumulated, the final result is returned.

The parallel implementation starts off by splitting the stream into segments. Each segment is processed by its own thread in the sequential fashion I described above. Now, if we have N threads, we have N intermediate results. These need to be reduced down to one result. Since each intermediate result is of type T, and we have several, we can use the same accumulator function to reduce those N intermediate results down to a single result.

Now let's consider a hypothetical two-arg reduction operation that reduces Stream<T> to U. In other languages, this is called a "fold" or "fold-left" operation so that's what I'll call it here. Note this doesn't exist in Java.

U foldLeft(I, (U, T) -> U)

(Note that the identity value I is of type U.)

The sequential version of foldLeft is just like the sequential version of reduce except that the intermediate values are of type U instead of type T. But it's otherwise the same. (A hypothetical foldRight operation would be similar except that the operations would be performed right-to-left instead of left-to-right.)

Now consider the parallel version of foldLeft. Let's start off by splitting the stream into segments. We can then have each of the N threads reduce the T values in its segment into N intermediate values of type U. Now what? How do we get from N values of type U down to a single result of type U?

What's missing is another function that combines the multiple intermediate results of type U into a single result of type U. If we have a function that combines two U values into one, that's sufficient to reduce any number of values down to one -- just like the original reduction above. Thus, the reduction operation that gives a result of a different type needs two functions:

U reduce(I, (U, T) -> U, (U, U) -> U)

Or, using Java syntax:

<U> U reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner)

In summary, to do parallel reduction to a different result type, we need two functions: one that accumulates T elements to intermediate U values, and a second that combines the intermediate U values into a single U result. If we aren't switching types, it turns out that the accumulator function is the same as the combiner function. That's why reduction to the same type has only the accumulator function and reduction to a different type requires separate accumulator and combiner functions.

Finally, Java doesn't provide foldLeft and foldRight operations because they imply a particular ordering of operations that is inherently sequential. This clashes with the design principle stated above of providing APIs that support sequential and parallel operation equally.

Naman
  • 27,789
  • 26
  • 218
  • 353
Stuart Marks
  • 127,867
  • 37
  • 205
  • 259
  • 13
    So what can you do if you need a `foldLeft` because the computation depends on the previous result and can not be parallelized? – amoebe May 09 '15 at 18:51
  • 5
    @amoebe You can implement your own foldLeft using `forEachOrdered`. The intermediate state has to be kept in a captured variable, though. – Stuart Marks May 10 '15 at 03:25
  • 1
    @StuartMarks thanks, I ended up using jOOλ. They have a neat [implementation of `foldLeft`](https://github.com/jOOQ/jOOL/blob/2b62e412c64cb7215f508eda2d5249ae96995722/src/main/java/org/jooq/lambda/Seq.java#L1160). – amoebe May 10 '15 at 13:12
  • 1
    Love this answer! Correct me if I'm wrong: this explains why OP's running example (the second one) will never invoke the combiner, when run, being the stream sequential. – Luigi Cortese Nov 25 '15 at 12:10
  • 2
    It explains almost everything... except: why should this exclude sequentially based reduction. In my case it is IMPOSSIBLE to do it in parallel as my reduction reduces a list of functions into a U by calling each function on the intermediate result of its predecessors result. This cannot be done in parallel at all and there is no way to describe a combiner. What method can I use to accomplish this? – Zordid Aug 28 '18 at 13:54
  • @Zordid you have the old sad for loop or you can use the forEach with an external variable as an accumulator. It's sad and I'm in your situation, so you have my sympathy – Rick77 Jun 07 '19 at 12:32
  • @Zordid sounds like you're trying to use stream reduction for something it's not intended for. Reduction should be stateless, it sounds like yours isn't. What does "predecessors result" mean? Sounds like that might be your problem. – Frans Nov 22 '19 at 11:45
  • @Frans It is not statefulness that is the issue here. The issue is that the Java stream API requires accumulator functions to be associative (to support parallel execution) and the accumulator Zordid describes is not associative. Allowing parallel execution is not possible at the same time as allowing non-associative accumulator functions. – Silwing Sep 03 '20 at 10:06
  • @Silwing I'm not sure what you mean by associative; can you explain? I think we're saying the same thing just in different terms. – Frans Sep 13 '20 at 13:54
  • @Frans associative is a property an operation can have. Meaning if you chain multiple of those operations together it doesn't matter which order they are executed in: (a op b) op c = a op (b op c) Anything that need a specific order of execution is thus not associative. It can be stateless without being associative is what I mean. – Silwing Sep 14 '20 at 09:21
  • 1
    I don't understand how this API satisfy the thing you said about how parallel and sequential should work the same way. I am implementing the same logic (combining accumulated results with current element vs combining accumulated results) in two places: accumulator and combiner. If you make mistake in one of those function result with differ depending on the way you run the stream. – endertunc Oct 13 '20 at 10:26
  • Great answer, but unnecessarily theoretical. If I read/understood it before, I wouldn't post this one: https://stackoverflow.com/questions/64403813/combiner-never-gets-called-in-reduction-operation-but-is-mandatory – igobivo Oct 17 '20 at 19:53
  • Great explanation, some of these facts should be in the `reduce` Javadoc, specially the ones regarding sequential/parallel stream and combiner. – Gerard Bosch Oct 25 '20 at 08:03
  • 1
    @amoebe I just have the combiner throw new UnsupportedOperationException("No parallel stream support.") – Didier A. Sep 14 '22 at 06:05
183

Since I like doodles and arrows to clarify concepts... let's start!

From String to String (sequential stream)

Suppose having 4 strings: your goal is to concatenate such strings into one. You basically start with a type and finish with the same type.

You can achieve this with

String res = Arrays.asList("one", "two","three","four")
        .stream()
        .reduce("",
                (accumulatedStr, str) -> accumulatedStr + str);  //accumulator

and this helps you to visualize what's happening:

enter image description here

The accumulator function converts, step by step, the elements in your (red) stream to the final reduced (green) value. The accumulator function simply transforms a String object into another String.

From String to int (parallel stream)

Suppose having the same 4 strings: your new goal is to sum their lengths, and you want to parallelize your stream.

What you need is something like this:

int length = Arrays.asList("one", "two","three","four")
        .parallelStream()
        .reduce(0,
                (accumulatedInt, str) -> accumulatedInt + str.length(),                 //accumulator
                (accumulatedInt, accumulatedInt2) -> accumulatedInt + accumulatedInt2); //combiner

and this is a scheme of what's happening

enter image description here

Here the accumulator function (a BiFunction) allows you to transform your String data to an int data. Being the stream parallel, it's splitted in two (red) parts, each of which is elaborated independently from eachother and produces just as many partial (orange) results. Defining a combiner is needed to provide a rule for merging partial int results into the final (green) int one.

From String to int (sequential stream)

What if you don't want to parallelize your stream? Well, a combiner needs to be provided anyway, but it will never be invoked, given that no partial results will be produced.

Community
  • 1
  • 1
Luigi Cortese
  • 10,841
  • 6
  • 37
  • 48
  • 11
    Thanks for this. I didn't even need to read. I do wish they would have just added a freaking fold function. – Lodewijk Bogaards Mar 04 '16 at 14:57
  • 1
    @LodewijkBogaards glad it helped! [JavaDoc](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#reduce-U-java.util.function.BiFunction-java.util.function.BinaryOperator-) here is pretty cryptic indeed – Luigi Cortese Mar 04 '16 at 15:31
  • @LuigiCortese In the parallel stream does it always divide the elements to pairs? – TheLogicGuy May 18 '17 at 11:40
  • 3
    I appreciate your clear and useful answer. I want to repeat a bit of what you said: "Well, a combiner needs to be provided anyway, but it will never be invoked." This is part of the Brave New World of Java functional programming that, I have been assured countless times, "makes your code more concise and easier to read." Let's hope that examples of (finger quotes) concise clearness such as this remain few and far between. – dnuttle May 23 '19 at 13:48
  • It will be MUCH better to illustrate reduce with eight strings ... – Ekaterina Ivanova iceja.net Jun 01 '20 at 21:46
  • 5
    This is the best answer. Hands down. – Mingtao Sun Dec 15 '20 at 10:15
  • Very helpful, thank you. A picture is still worth a thousand words. Illuminated the relationship between the combiner and parallel streams very nicely. – akagixxer Dec 02 '21 at 16:19
  • 1
    Thanks for the answer. One question: in the String to int (sequential stream) case, why does the function signature contain the combiner if it'll never be invoked? Is it just to satisfy the interface? That said, is there a clean way to provide a dummy combiner? – lnogueir Mar 10 '22 at 02:39
  • well explained with graphical illustration. my cocepts are cleared now for accumulator and combiner – Sachin Rane Jul 21 '22 at 14:52
92

The two and three argument versions of reduce which you tried to use don't accept the same type for the accumulator.

The two argument reduce is defined as :

T reduce(T identity,
         BinaryOperator<T> accumulator)

In your case, T is String, so BinaryOperator<T> should accept two String arguments and return a String. But you pass to it an int and a String, which results in the compilation error you got - argument mismatch; int cannot be converted to java.lang.String. Actually, I think passing 0 as the identity value is also wrong here, since a String is expected (T).

Also note that this version of reduce processes a stream of Ts and returns a T, so you can't use it to reduce a stream of String to an int.

The three argument reduce is defined as :

<U> U reduce(U identity,
             BiFunction<U,? super T,U> accumulator,
             BinaryOperator<U> combiner)

In your case U is Integer and T is String, so this method will reduce a stream of String to an Integer.

For the BiFunction<U,? super T,U> accumulator you can pass parameters of two different types (U and ? super T), which in your case are Integer and String. In addition, the identity value U accepts an Integer in your case, so passing it 0 is fine.

Another way to achieve what you want :

int length = asList("str1", "str2").stream().mapToInt (s -> s.length())
            .reduce(0, (accumulatedInt, len) -> accumulatedInt + len);

Here the type of the stream matches the return type of reduce, so you can use the two parameter version of reduce.

Of course you don't have to use reduce at all :

int length = asList("str1", "str2").stream().mapToInt (s -> s.length())
            .sum();
Eran
  • 387,369
  • 54
  • 702
  • 768
  • 9
    As a second option in your last code, you could also use `mapToInt(String::length)` over `mapToInt(s -> s.length())`, not sure if one would be better over the other, but I prefer the former for readability. – skiwi Jun 19 '14 at 15:35
  • 44
    Many will find this answer as they don't get why the `combiner` is needed, why not having the `accumulator` is enough. In that case: The combiner is only needed for parallel streams, to combine the "accumulated" results of the threads. – ddekany Nov 24 '17 at 14:54
  • 13
    I don't find your answer particular useful - because you do not explain at all what the combiner should do and how I can work without it! In my case I want to reduce a type T to a U but there is no way this can ever be done in parallel at all. It is simply not possible. How do you tell the system I don't want/need parallelism and thus leave out the combiner? – Zordid Aug 28 '18 at 13:50
  • @Zordid the Streams API doesn't include an option to reduce type T to a U without passing a combiner. – Eran Aug 28 '18 at 16:06
  • 3
    This answer doesn't explain the combiner at all, only why OP needs the non-combiner variants. – Benny Bottema Sep 25 '20 at 07:52
1

There is no reduce version that takes two different types without a combiner since it can't be executed in parallel (not sure why this is a requirement). The fact that accumulator must be associative makes this interface pretty much useless since:

list.stream().reduce(identity,
                     accumulator,
                     combiner);

Produces the same results as:

list.stream().map(i -> accumulator(identity, i))
             .reduce(identity,
                     combiner);
quiz123
  • 46
  • 5
  • Such `map` trick depending on particular `accumulator` and `combiner` may slow down the things pretty much. – Tagir Valeev Sep 04 '15 at 15:51
  • Or, speed it up significantly since you can now simplify `accumulator` by dropping the first parameter. – quiz123 Sep 04 '15 at 16:27
  • Parallel reduction is possible, it depends on your computation. In your case, you must be aware of complexity of combiner but also accumulator on identity vs others instances. – LoganMzz Jun 27 '17 at 11:59