8

Do Java (9+) streams support a HAVING clause similar to SQL? Use case: grouping and then dropping all groups with certain count. Is it possible to write the following SQL clause as Java stream?

GROUP BY id
HAVING COUNT(*) > 5

The closest I could come up with was:

input.stream()
        .collect(groupingBy(x -> x.id()))
        .entrySet()
        .stream()
        .filter(entry -> entry.getValue().size() > 5)
        .collect(toMap(Map.Entry::getKey, Map.Entry::getValue));

but extracting the entrySet of the grouped result to collect twice feels strange and especially the terminal collect call is basically mapping a map to itself.

I see that there are collectingAndThen and filtering collectors, but I don't know if they would solve my problem (or rather how to apply them correctly).

Is there a better (more idiomatic) version of the above, or am I stuck with collecting to an intermediate map, filtering that and then collecting to the final map?

knittl
  • 246,190
  • 53
  • 318
  • 364

3 Answers3

9

The operation has to be performed after the grouping in general, as you need to fully collect a group before you can determine whether it fulfills the criteria.

Instead of collecting a map into another, similar map, you can use removeIf to remove non-matching groups from the result map and inject this finishing operation into the collector:

Map<KeyType, List<ElementType>> result =
    input.stream()
        .collect(collectingAndThen(groupingBy(x -> x.id(), HashMap::new, toList()),
            m -> {
                m.values().removeIf(l -> l.size() <= 5);
                return m;
            }));

Since the groupingBy(Function) collector makes no guarantees regarding the mutability of the created map, we need to specify a supplier for a mutable map, which requires us to be explicit about the downstream collector, as there is no overloaded groupingBy for specifying only function and map supplier.

If this is a recurring task, we can make a custom collector improving the code using it:

public static <T,K,V> Collector<T,?,Map<K,V>> having(
                      Collector<T,?,? extends Map<K,V>> c, BiPredicate<K,V> p) {
    return collectingAndThen(c, in -> {
        Map<K,V> m = in;
        if(!(m instanceof HashMap)) m = new HashMap<>(m);
        m.entrySet().removeIf(e -> !p.test(e.getKey(), e.getValue()));
        return m;
    });
}

For higher flexibility, this collector allows an arbitrary map producing collector but since this does not enforce a map type, it will enforce a mutable map afterwards, by simply using the copy constructor. In practice, this won’t happen, as the default is to use a HashMap. It also works when the caller explicitly requests a LinkedHashMap to maintain the order. We could even support more cases by changing the line to

if(!(m instanceof HashMap || m instanceof TreeMap
  || m instanceof EnumMap || m instanceof ConcurrentMap)) {
    m = new HashMap<>(m);
}

Unfortunately, there is no standard way to determine whether a map is mutable.

The custom collector can now be used nicely as

Map<KeyType, List<ElementType>> result =
    input.stream()
        .collect(having(groupingBy(x -> x.id()), (key,list) -> list.size() > 5));
Naman
  • 27,789
  • 26
  • 218
  • 353
Holger
  • 285,553
  • 42
  • 434
  • 765
2

The only way I am aware of is to use Collectors.collectingAndThen with the same implementation inside the finisher function:

Map<Integer, List<Item>> a = input.stream().collect(Collectors.collectingAndThen(
        Collectors.groupingBy(Item::id),
        map -> map.entrySet().stream()
                             .filter(e -> e.getValue().size() > 5)
                             .collect(Collectors.toMap(Entry::getKey, Entry::getValue))));
Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
  • 1
    Yeah, streams are not always looking nice :( – Nikolas Charalambidis Apr 23 '20 at 21:41
  • Probably the closest to what I want, but unfortunately still has to extract the `entrySet` and create a stream out of that. Maybe this can be hidden behind custom Collector implementation. Then calling `.collect(collectingThen(groupingBy(Item::Id), having(set -> set.size() > 5)))` doesn't look so bad anymore. The `having` Collector doesn't have to use a stream, it could simply run a loop and call `.remove` on the map/`.removeIf` on the map's entrySet (not sure if this follows the expectation of a collector, or if a new collection has to be returned). – knittl Apr 24 '20 at 05:14
0

If you want a more readable code you could also (as a re-stream alternative) using Guava filterValues function.

It allows transforming maps and sometimes offers shorter and more readable syntax than Java streams.

Map<A,B> unfiltered = java stream groupingby
return Maps.filterValues(unfiltered, value -> value.size() > 5);
Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
CodeScale
  • 3,046
  • 1
  • 12
  • 20
  • Without Guava, one can also use `removeIf` over the `entrySet` of the grouped map, but that doesn't quite solve it the way OP is asking for. – Naman Apr 24 '20 at 02:26
  • 1
    @Naman why not? `collectingAndThen(groupingBy(x -> x.id(), HashMap::new, toList()), m -> { m.values().removeIf(l -> l.size() <= 5); return m; })` seems to be what the OP is asking for. – Holger Apr 24 '20 at 08:03
  • @Holger There was already an answer [mentioning a similar approach](https://stackoverflow.com/a/61397009/1746118), I was instead trying to make a point of not looking into an additional library for such a trivial task. Other than that, you have played a slight again with my thoughts, and if you can answer "*Why not `entrySet.removeIf`, why `values().removeIf`?*" – Naman Apr 24 '20 at 08:56
  • 2
    @Naman that answer still is using another stream creating a new hash map, rather than `removeIf`. The choice for `keySet()`, `values()`, or `entrySet()` should always be the same consideration: “what do I actually need?” When you only need the value in the predicate, use `values()` and eliminate the need to call `Map.Entry.getValue()` for each element. – Holger Apr 24 '20 at 09:13