0

I am trying to figure out what accumulator and combiner do in reduce stream operation.

    List<User> users = Arrays.asList(new User("John", 30), new User("Julie", 35));

    int result = users.stream()
            .reduce(0,
                    (partialAgeResult, user) -> {
                        // accumulator is called twice
                        System.out.println(MessageFormat.format("partialAgeResult {0}, user {1}", partialAgeResult, user));
                        return partialAgeResult + user.getAge();
                    },
                    (integer, integer2) -> {
                        // combiner is never called
                        System.out.println(MessageFormat.format("integer {0}, integer2 {1}", integer, integer2));
                        return integer * integer2;
                    });

    System.out.println(MessageFormat.format("Result is {0}", result)); 

I notice that the combiner is never executed, and the result is 65. If I use users.parallelStream() then the combiner is executed once and the result is 1050.

Why stream and parallelStream yield different results? I don't see any side-effects of executing this in parallel.

What is the purpose of the combiner in the simple stream version?

E_net4
  • 27,810
  • 13
  • 101
  • 139
igobivo
  • 433
  • 1
  • 4
  • 17

2 Answers2

2

The problem is here. You are multiplying and not adding in your combiner.

 (integer, integer2) -> {
                        // combiner is never called
                        System.out.println(MessageFormat.format("integer {0}, integer2 {1}", integer, integer2));
                        return integer * integer2; //<----- Should be addition
                    });

The combiner is used to appropriately combine various parts of a parallel operation as these operations can perform independently on individual "pieces" of the original stream.

A simple example would be summing a list of elements. You could have a variety of partial sums in a parallel operation, so you need to sum the partial sums in the combiner to get the total sum (a good exercise for you to try and see for yourself).

WJS
  • 36,363
  • 4
  • 24
  • 39
  • So basically combiner is the accumulator of intermediate results when processing a parallel stream. It is a shame that API forces you to provide this even though it has no effect on a sequential stream. I find this confusing at best. – igobivo Oct 17 '20 at 18:47
  • 1
    Yes. The API does not force you do do this. You can omit the combiner in a reduce statement. – WJS Oct 17 '20 at 19:03
  • I don't believe so, here is the signature `T reduce(T identity, BinaryOperator accumulator);` You are using an accumulator. The accumulator will always be called. The combiner is for parallel operations. – WJS Oct 17 '20 at 19:08
  • Yes! I agree with that. – WJS Oct 17 '20 at 19:22
  • @WJS "Yes. The API does not force you do do this. You can omit the combiner in a reduce statement." - but actually API is forcing me. I cannot use the accumulator which converts from stream type to something else (BiFunction like it is mentioned before). – igobivo Oct 17 '20 at 19:32
  • @igobivo Correct. What you could do in your stream is `.mapToInt(User::getAge).reduce(0,(a,b)->a+b);` – WJS Oct 17 '20 at 19:34
  • Oh I get it now... the reason why other reduce calls do not have the combiner is because accumulator can be applied to partial results of a parallel stream. In case where accumulator accepts different input types this is not the case, so the partial result accumulator aka combiner needs to be specified. And yes, this one needs to be semantically equivalent to accumulator, meaning if you want to add all user ages, then you should add them in combiner too (not multiply or something else). – igobivo Oct 17 '20 at 19:42
  • 1
    You may find [this](https://stackoverflow.com/questions/24308146/why-is-a-combiner-needed-for-reduce-method-that-converts-type-in-java-8) helpful and interesting. They are discussing the exact same thing. – WJS Oct 17 '20 at 19:43
  • Yes, I was aware of this question before I posted mine. But now I see that the real value is in this answer https://stackoverflow.com/a/24316429/2400849, but it was too academic/theoretical for my taste to even consider it reading. :) – igobivo Oct 17 '20 at 19:52
1

For a sequential stream with a mismatch between the types of the accumulator arguments or implementation( BiFunction<U,? super T,U>), you have to give combiner but that never invoked since you there is no need to combine partial result those are parallelly calculated.

So you can simplify this by just convert into partial data before reduce to avoid giving combiner.

users.stream().map(e -> e.getAge()).reduce(0, (a, b) -> a + b);

So, there is no purpose using a combiner with an accumulator like BiFunction<U,? super T,U> for sequential stream actually, but you have to provide since there is no method like

reduce(U identity, BiFunction<U,? super T,U> accumulator)

But for parallel stream combiner called. And you are getting 1050 because your multiplying in combiner that means (30*35).

Eklavya
  • 17,618
  • 4
  • 28
  • 57
  • "So, there is no purpose using combiner with accumulator for sequential stream acctually, but you have to provide to use accumulator." - that's what I call a bad API design,,, – igobivo Oct 17 '20 at 18:37
  • "For a sequential stream with a mismatch between the types of the accumulator arguments or implementation" - can you please clarify this. Where is the mismatch? Accumulator returns an int. Int is stored in the `result` variable... – igobivo Oct 17 '20 at 18:42
  • _(partialAgeResult, user)_ here _partialAgeResult_ is Integer but _user_ is _User_ Type – Eklavya Oct 17 '20 at 18:44
  • So in case of this mismatch, the combiner is provided. But it is never executed. So that means there is no mismatch after all? – igobivo Oct 17 '20 at 18:55
  • There is mismatched in accumulator arguments. Acctually if you are not using a parallel stream what is the purpose of Accumulator rather different type. – Eklavya Oct 17 '20 at 19:04
  • I don't see how combiner solves the problem of different accumulator args. Accumulator returns an int. And combiner accepts ints and returns an int. So the combiner is not doing any conversion which would solve this mismatch problem. And it is not even called, so it is not involved in solving any missmatch issue. All what combiner is doing, in my opinion, is combining partial results when using parallel stream. And this has nothing to do with mismatched types. – igobivo Oct 17 '20 at 19:10
  • Combiner doesn't do about different arg, it just if you use _BiFunction_ as accumulator then you have to use combiner and if you use _BinaryOperator_ as accumulator then it same as combiner no need to use combiner. – Eklavya Oct 17 '20 at 19:24
  • If you want to use BiFunction as an accumulator without a combiner for the sequential stream that's not possible, you have to provide combiner which never invoked. – Eklavya Oct 17 '20 at 19:31
  • I upvoted :) Thanks for your time and effort. Actually, your comment from the accepted answer (looks like it is gone now) was very useful, because I finally understood what is the purpose of the combiner :) – igobivo Oct 17 '20 at 22:23