14

I want to use a Java 8 Stream and Group by one classifier but have multiple Collector functions. So when grouping, for example the average and the sum of one field (or maybe another field) is calculated.

I try to simplify this a bit with an example:

public void test() {
    List<Person> persons = new ArrayList<>();
    persons.add(new Person("Person One", 1, 18));
    persons.add(new Person("Person Two", 1, 20));
    persons.add(new Person("Person Three", 1, 30));
    persons.add(new Person("Person Four", 2, 30));
    persons.add(new Person("Person Five", 2, 29));
    persons.add(new Person("Person Six", 3, 18));

    Map<Integer, Data> result = persons.stream().collect(
            groupingBy(person -> person.group, multiCollector)
    );
}

class Person {
    String name;
    int group;
    int age;

    // Contructor, getter and setter
}

class Data {
    long average;
    long sum;

    public Data(long average, long sum) {
        this.average = average;
        this.sum = sum;
    }

    // Getter and setter
}

The result should be a Map that associates the result of grouping like

1 => Data(average(18, 20, 30), sum(18, 20, 30))
2 => Data(average(30, 29), sum(30, 29))
3 => ....

This works perfectly fine with one function like "Collectors.counting()" but I like to chain more than one (ideally infinite from a List).

List<Collector<Person, ?, ?>>

Is it possible to do something like this?

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
PhilippS
  • 2,785
  • 4
  • 16
  • 20
  • Do I understand correctly, that your `Data` class is just a placeholder for a collection of flatmap-functions? Each of these functions must perform the same operation on all groups (f.e. first calculate groups avg. age, secondly total age, etc.)? – SME_Dev Aug 18 '15 at 12:18
  • When I got you right, I think so. I just used it to hold multiple data in one object while still able to identify which one is which. Could also be an Array or a Map with Key=FunctionName, Value=FunctionResult. – PhilippS Aug 18 '15 at 12:22

5 Answers5

18

For the concrete problem of summing and averaging, use collectingAndThen along with summarizingDouble:

Map<Integer, Data> result = persons.stream().collect(
        groupingBy(Person::getGroup, 
                collectingAndThen(summarizingDouble(Person::getAge), 
                        dss -> new Data((long)dss.getAverage(), (long)dss.getSum()))));

For the more generic problem (collect various things about your Persons), you can create a complex collector like this:

// Individual collectors are defined here
List<Collector<Person, ?, ?>> collectors = Arrays.asList(
        Collectors.averagingInt(Person::getAge),
        Collectors.summingInt(Person::getAge));

@SuppressWarnings("unchecked")
Collector<Person, List<Object>, List<Object>> complexCollector = Collector.of(
    () -> collectors.stream().map(Collector::supplier)
        .map(Supplier::get).collect(toList()),
    (list, e) -> IntStream.range(0, collectors.size()).forEach(
        i -> ((BiConsumer<Object, Person>) collectors.get(i).accumulator()).accept(list.get(i), e)),
    (l1, l2) -> {
        IntStream.range(0, collectors.size()).forEach(
            i -> l1.set(i, ((BinaryOperator<Object>) collectors.get(i).combiner()).apply(l1.get(i), l2.get(i))));
        return l1;
    },
    list -> {
        IntStream.range(0, collectors.size()).forEach(
            i -> list.set(i, ((Function<Object, Object>)collectors.get(i).finisher()).apply(list.get(i))));
        return list;
    });

Map<Integer, List<Object>> result = persons.stream().collect(
        groupingBy(Person::getGroup, complexCollector)); 

Map values are lists where first element is the result of applying the first collector and so on. You can add a custom finisher step using Collectors.collectingAndThen(complexCollector, list -> ...) to convert this list to something more appropriate.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • Ok that is interesting and I think relates to the answer of Peter Lawrey. But this would mean that it is not flexible for every type of function I think. I want to use a List of Functions (Collectors). With your solution I think I'm limited to what summarizingDouble does. – PhilippS Aug 18 '15 at 12:24
  • Wow! Thanks for the answer. Did not expect to get something good. Just tested it and the result is what i was looking for. Now I just have to understand everything of your code. I think that takes some time. Than I can tweak it a little bit to match exactly what I try to accomplish. But this is really what I was looking for! – PhilippS Aug 18 '15 at 12:43
  • 1
    @PhilippS, it just combines the individual actions of individual collectors (new supplier creates the list of the results of individual suppliers, new accumulator calls accumulate for each individual list item and so on). Feel free to ask if you have concrete questions about this implementation. – Tagir Valeev Aug 18 '15 at 12:49
  • I would rework this by putting everything into a factory method. `collectors` can be a local variable (method parameter, actually) closed over by the lambdas. The finisher lambda could additionally apply a function supplied by the user which (in OP's case) will take the list and return the `Data` object. – Marko Topolnik Aug 18 '15 at 14:48
  • I extracted the code combining given collectors to a method and fixed generics at https://gist.github.com/dpolivaev/50cc9eb1b75453d37195882a9fc9fb69 – dpolivaev Sep 25 '19 at 08:34
4

By using a map as an output type one could have a potentially infinite list of reducers each producing its own statistic and adding it to the map.

public static <K, V> Map<K, V> addMap(Map<K, V> map, K k, V v) {
    Map<K, V> mapout = new HashMap<K, V>();
    mapout.putAll(map);
    mapout.put(k, v);
    return mapout;
}

...

    List<Person> persons = new ArrayList<>();
    persons.add(new Person("Person One", 1, 18));
    persons.add(new Person("Person Two", 1, 20));
    persons.add(new Person("Person Three", 1, 30));
    persons.add(new Person("Person Four", 2, 30));
    persons.add(new Person("Person Five", 2, 29));
    persons.add(new Person("Person Six", 3, 18));

    List<BiFunction<Map<String, Integer>, Person, Map<String, Integer>>> listOfReducers = new ArrayList<>();

    listOfReducers.add((m, p) -> addMap(m, "Count", Optional.ofNullable(m.get("Count")).orElse(0) + 1));
    listOfReducers.add((m, p) -> addMap(m, "Sum", Optional.ofNullable(m.get("Sum")).orElse(0) + p.i1));

    BiFunction<Map<String, Integer>, Person, Map<String, Integer>> applyList
            = (mapin, p) -> {
                Map<String, Integer> mapout = mapin;
                for (BiFunction<Map<String, Integer>, Person, Map<String, Integer>> f : listOfReducers) {
                    mapout = f.apply(mapout, p);
                }
                return mapout;
            };
    BinaryOperator<Map<String, Integer>> combineMaps
            = (map1, map2) -> {
                Map<String, Integer> mapout = new HashMap<>();
                mapout.putAll(map1);
                mapout.putAll(map2);
                return mapout;
            };
    Map<String, Integer> map
            = persons
            .stream()
            .reduce(new HashMap<String, Integer>(),
                    applyList, combineMaps);
    System.out.println("map = " + map);

Produces :

map = {Sum=10, Count=6}
WillShackleford
  • 6,918
  • 2
  • 17
  • 33
3

You could chain them,

A collector can only produce one object, but this object can hold multiple values. You could return a Map for example where the map has an entry for each collector you are returning.

You can use Collectors.of(HashMap::new, accumulator, combiner);

Your accumulator would have a Map of Collectors where the keys of the Map produced matches the name of the Collector. Te combiner would need a way to combine multiple result esp when this is performed in parallel.


Generally the built in collectors use a data type for complex results.

From Collectors

public static <T>
Collector<T, ?, DoubleSummaryStatistics> summarizingDouble(ToDoubleFunction<? super T> mapper) {
    return new CollectorImpl<T, DoubleSummaryStatistics, DoubleSummaryStatistics>(
            DoubleSummaryStatistics::new,
            (r, t) -> r.accept(mapper.applyAsDouble(t)),
            (l, r) -> { l.combine(r); return l; }, CH_ID);
}

and in its own class

public class DoubleSummaryStatistics implements DoubleConsumer {
    private long count;
    private double sum;
    private double sumCompensation; // Low order bits of sum
    private double simpleSum; // Used to compute right sum for non-finite inputs
    private double min = Double.POSITIVE_INFINITY;
    private double max = Double.NEGATIVE_INFINITY;
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • I'm trying to figure out how this can help my. Perhaps I'm not on the right track already. Isn't this just one built-in function like the mentioned "Collectors.counting()"? But what I want to do is, use two or more of these build in functions (unknown at compile time). Maybe you can explain a little more. – PhilippS Aug 18 '15 at 12:04
  • Thanks for the answer. This provides good background knowledge to especially understand Tagirs answer and adapt it. – PhilippS Aug 18 '15 at 12:47
3

Instead of chaining the collectors, you should build an abstraction which is an aggregator of collectors: implement the Collector interface with a class which accepts a list of collectors and delegates each method invocation to each of them. Then, in the end, you return new Data() with all the results the nested collectors produced.

You can avoid creating a custom class with all the method declarations by making use of Collector.of(supplier, accumulator, combiner, finisher, Collector.Characteristics... characteristics) The finisher lambda will call the finisher of each nested collector, then return the Data instance.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • Thanks for your answer. Especially in combination with other answers this provides good background knowledge on how to adapt my code. – PhilippS Aug 18 '15 at 12:45
1

In Java12, the Collectors API has been extended with a static teeing(...) function:

teeing​(Collector<? super T,​?,​R1> downstream1, Collector<? super T,​?,​R2> downstream2, BiFunction<? super R1,​? super R2,​R> merger)

This provides an in-built functionality to use two collectors on one Stream and merge the results into an object.

Below is a small example where a list of employees is being split into groups of age and for each group two Collectors.summarizingInt() performed on age and salary are returned as a list of IntSummaryStatistics:

import java.util.*;
import java.util.function.Function;
import java.util.stream.Collectors;

public class CollectorTeeingTest {

public static void main(String... args){

    NavigableSet<Integer> age_groups = new TreeSet<>();
    age_groups.addAll(List.of(30,40,50,60,Integer.MAX_VALUE)); //we don't want to map to null

    Function<Integer,Integer> to_age_groups = age -> age_groups.higher(age);

    List<Employee> employees = List.of( new Employee("A",21,2000),
                                        new Employee("B",24,2400),
                                        new Employee("C",32,3000),
                                        new Employee("D",40,4000),
                                        new Employee("E",41,4100),
                                        new Employee("F",61,6100)
    );

    Map<Integer,List<IntSummaryStatistics>> stats = employees.stream()
            .collect(Collectors.groupingBy(
                employee -> to_age_groups.apply(employee.getAge()),
                Collectors.teeing(
                    Collectors.summarizingInt(Employee::getAge),
                    Collectors.summarizingInt(Employee::getSalary),
                    (stat1, stat2) -> List.of(stat1,stat2))));

    stats.entrySet().stream().forEach(entry -> {
        System.out.println("Age-group: <"+entry.getKey()+"\n"+entry.getValue());
    });
}

public static class Employee{

    private final String name;
    private final int age;
    private final int salary;

    public Employee(String name, int age, int salary){
        
        this.name = name;
        this.age = age;
        this.salary = salary;
    }
    public String getName(){return this.name;}
    public int getAge(){return this.age;}
    public int getSalary(){return this.salary;}
}

}

Output:

Age-group: <2147483647
[IntSummaryStatistics{count=1, sum=61, min=61, average=61,000000, max=61}, IntSummaryStatistics{count=1, sum=6100, min=6100, average=6100,000000, max=6100}]
Age-group: <50
[IntSummaryStatistics{count=2, sum=81, min=40, average=40,500000, max=41}, IntSummaryStatistics{count=2, sum=8100, min=4000, average=4050,000000, max=4100}]
Age-group: <40
[IntSummaryStatistics{count=1, sum=32, min=32, average=32,000000, max=32}, IntSummaryStatistics{count=1, sum=3000, min=3000, average=3000,000000, max=3000}]
Age-group: <30
[IntSummaryStatistics{count=2, sum=45, min=21, average=22,500000, max=24}, IntSummaryStatistics{count=2, sum=4400, min=2000, average=2200,000000, max=2400}]
motaa
  • 327
  • 2
  • 11