Java 8 Stream API - Selecting only values after Collectors.groupingBy(..)

Question

Say I have the following collection of Student objects which consist of Name(String), Age(int) and City(String).

I am trying to use Java's Stream API to achieve the following sql-like behavior:

SELECT MAX(age)
FROM Students
GROUP BY city

Now, I found two different ways to do so:

final List<Integer> variation1 =
            students.stream()
                    .collect(Collectors.groupingBy(Student::getCity, Collectors.maxBy((s1, s2) -> s1.getAge() - s2.getAge())))
                    .values()
                    .stream()
                    .filter(Optional::isPresent)
                    .map(Optional::get)
                    .map(Student::getAge)
                    .collect(Collectors.toList());

And the other one:

final Collection<Integer> variation2 =
            students.stream()
                    .collect(Collectors.groupingBy(Student::getCity,
                            Collectors.collectingAndThen(Collectors.maxBy((s1, s2) -> s1.getAge() - s2.getAge()),
                                    optional -> optional.get().getAge())))
                    .values();

In both ways, one has to .values() ... and filter the empty groups returned from the collector.

Is there any other way to achieve this required behavior?

These methods remind me of over partition by sql statements...

Thanks

Edit: All the answers below were really interesting, but unfortunately this is not what I was looking for, since what I try to get is just the values. I don't need the keys, just the values.

I mean not have to call `values()` since I don't need the keys at all — Ghost93, Mar 02 '16 at 21:57
The implementation _must_ use the keys to figure out how to group the elements. You can discard the keys afterwards, but that doesn't mean you don't need them in the meantime. — Louis Wasserman, Mar 02 '16 at 21:59

Tagir Valeev · Answer 1 · 2016-03-01T10:57:27.533

31

Do not always stick with groupingBy. Sometimes toMap is the thing you need:

Collection<Integer> result = students.stream()
    .collect(Collectors.toMap(Student::getCity, Student::getAge, Integer::max))
    .values();

Here you just create a Map where keys are cities and values are ages. In case when several students have the same city, merge function is used which just selects maximal age here. It's faster and cleaner.

edited Mar 01 '16 at 10:57

answered Mar 01 '16 at 05:19

Tagir Valeev

97,161
19
222
334

1

I added a `groupingBy` based solution that is *almost* as simple, but as long as the OP is interested in the ages only, the `toMap` solution clearly is the simplest. – Holger Mar 01 '16 at 10:52

score 15 · Answer 2 · edited May 23 '17 at 12:10

15

As addition to Tagir’s great answer using toMap instead of groupingBy, here the short solution, if you want to stick to groupingBy:

Collection<Integer> result = students.stream()
    .collect(Collectors.groupingBy(Student::getCity,
                 Collectors.reducing(-1, Student::getAge, Integer::max)))
    .values();

Note that this three arg reducing collector already performs a mapping operation, so we don’t need to nest it with a mapping collector, further, providing an identity value avoids dealing with Optional. Since ages are always positive, providing -1 is sufficient and since a group will always have at least one element, the identity value will never show up as a result.

Still, I think Tagir’s toMap based solution is preferable in this scenario.

The groupingBy based solution becomes more interesting when you want to get the actual students having the maximum age, e.g

Collection<Student> result = students.stream().collect(
   Collectors.groupingBy(Student::getCity, Collectors.reducing(null, BinaryOperator.maxBy(
     Comparator.nullsFirst(Comparator.comparingInt(Student::getAge)))))
).values();

well, actually, even this can also be expressed using the toMap collector:

Collection<Student> result = students.stream().collect(
    Collectors.toMap(Student::getCity, Function.identity(),
        BinaryOperator.maxBy(Comparator.comparingInt(Student::getAge)))
).values();

You can express almost everything with both collectors, but groupingBy has the advantage on its side when you want to perform a mutable reduction on the values.

edited May 23 '17 at 12:10

Community

1
1

answered Mar 01 '16 at 10:45

Holger

285,553
42
434
765

I just realized that even getting the actual student can be done easily using `toMap`, so it requires more complex scenarios to get a benefit from `groupingBy` here… – Holger Mar 01 '16 at 10:56
Nevertheless the approach is interesting. I never combined `groupingBy` with two-arg/three-arg `reduce`. Probably it's useful sometimes... – Tagir Valeev Mar 01 '16 at 10:59
@Tagir Valeev: that’s why I added it despite your answer being better for this particular use case. – Holger Mar 01 '16 at 11:06
@Holger Isn't `Collectors.collectingAndThen(Collectors.maxBy(Comparator.comparingInt(Student::getAge)), Optional::get)` more straight-forward though? (For the second example.) – antak Mar 09 '17 at 07:58
@antak: well, in this case, using `toMap` is more straight-forward, even for the second example. `collectingAndThen(…, Optional::get)` might be cleaner than `Collectors.reducing(null, …)`, but it also implies that the collector has to go through the entire map to apply the finisher function at the end, so I sometimes prefer the downstream collector that produces the intended result in the first place. But as said, using `toMap` is even simpler here. – Holger Mar 09 '17 at 08:29
@Holger oic wrt the extra finisher pass. By `toMap` do you mean using something like `(a, b) -> a.getAge() >= b.getAge() ? a : b` for the merger? Or is there some way to prevent that doubling up of `getAge()`? – antak Mar 09 '17 at 11:12
1

@antak: you can use `BinaryOperator.maxBy(Comparator.comparingInt(Student::g‌etAge))` as merge function. – Holger Mar 09 '17 at 11:16
@Holger: That's brilliant! Should put that conclusion to your *solution becomes more interesting when you want to get the actual students* remark. Because I landed here after Googling for that use case. – antak Mar 09 '17 at 11:44
@antak: I inserted it. – Holger Mar 09 '17 at 16:13

Alexis C. · Answer 3 · 2016-02-29T22:33:55.030

The second approach calls get() on an Optional; this is usually a bad idea as you don't know if the optional will be empty or not (use orElse(), orElseGet(), orElseThrow() methods instead). While you might argue that in this case there always be a value since you generate the values from the student list itself, this is something to keep in mind.

Based on that, you might turn the variation 2 into:

final Collection<Integer> variation2 =
     students.stream()
             .collect(collectingAndThen(groupingBy(Student::getCity,
                                                   collectingAndThen(
                                                      mapping(Student::getAge, maxBy(naturalOrder())),
                                                      Optional::get)), 
                                        Map::values));

Although it really starts to be difficult to read, I'll probably use the variant 1:

final List<Integer> variation1 =
        students.stream()
            .collect(groupingBy(Student::getCity,
                                mapping(Student::getAge, maxBy(naturalOrder()))))
            .values()
            .stream()
            .map(Optional::get)
            .collect(toList());

This is now my preferred answer, though I'd personally move `Map.values()` outside the `collect` (though I'd keep everything else inside). — Louis Wasserman, Feb 29 '16 at 22:31
Yes, that would improve a bit the readability. As a reference I let it here: `final Collection variation2 = students.stream().collect(groupingBy(Student::getCity, collectingAndThen(mapping(Student::getAge, maxBy(naturalOrder())), Optional::get))).values(); ` — Alexis C., Feb 29 '16 at 22:36

score 0 · Answer 4 · edited Oct 01 '18 at 23:14

Here is my implementation 

    public class MaxByTest {

        static class Student {

        private int age;
        private int city;        

        public Student(int age, int city) {
            this.age = age;
            this.city = city;
        }

        public int getCity() {
            return city;
        }

        public int getAge() {
            return age;
        }

        @Override
        public String toString() {
            return " City : " + city + " Age : " + age;
        }



    }

    static List<Student> students = Arrays.asList(new Student[]{
        new Student(10, 1),
        new Student(9, 2),        
        new Student(8, 1),        
        new Student(6, 1),
        new Student(4, 1),
        new Student(8, 2),
        new Student(9, 2),
        new Student(7, 2),        
    });

    public static void main(String[] args) {
        final Comparator<Student> comparator = (p1, p2) -> Integer.compare( p1.getAge(), p2.getAge());
        final List<Student> studets =
            students.stream()
                    .collect(Collectors.groupingBy(Student::getCity, 
                            Collectors.maxBy(comparator))).values().stream().map(Optional::get).collect(Collectors.toList());
        System.out.println(studets);
    }
}

score -2 · Answer 5 · answered Feb 26 '18 at 22:18

-2

        List<BeanClass> list1 = new ArrayList<BeanClass>();
        DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
        list1.add(new BeanClass(123,abc,99.0,formatter.parse("2018-02-01")));
        list1.add(new BeanClass(456,xyz,99.0,formatter.parse("2014-01-01")));
        list1.add(new BeanClass(789,pqr,95.0,formatter.parse("2014-01-01")));
        list1.add(new BeanClass(1011,def,99.0,formatter.parse("2014-01-01")));
        Map<Object, Optional<Double>> byDate = list1.stream()
       .collect(Collectors.groupingBy(p -> formatter.format(p.getCurrentDate()),
        Collectors.mapping(BeanClass::getAge, Collectors.maxBy(Double::compare))));

answered Feb 26 '18 at 22:18

Surabhi

1
1

It would be helpful if you provided some explanation in addition to your code to help people to understand what it's doing. – Greg the Incredulous Feb 26 '18 at 23:04
It is always best practice to go through the code example in your answer instead of just pasting the code or putting a link to another resource. – Khushhal Nov 16 '21 at 20:13

Java 8 Stream API - Selecting only values after Collectors.groupingBy(..)

5 Answers5

Linked