Stream doesn't preserve the order after grouping

Question

I have a List name availableSeats I am sorting and grouping by the blockIndex property like below:

availableSeats.stream()
                .sorted(Comparator.comparing(SeatedTicketAssignment::getBlockIndex))
                .collect(Collectors.groupingBy(SeatedTicketAssignment::getBlockIndex))
                .forEach((block, blockAssignments) -> {
                     //Rest of the code
                }

The problem is that the result of grouping by is not sorted by the blockIndex.

Rogue · Accepted Answer · 2017-09-05T22:44:55.007

43

Keep in mind, Collectors#groupingBy(Function) will return a HashMap, which does not guarantee order. If you want the ordering to be in the order that the aggregate identity (your i % 2 == 0 result) shows up, then you can use LinkedHashMap:

.collect(Collectors.groupingBy(i -> i % 2 == 0, LinkedHashMap::new, Collectors.toList()))

Would return a LinkedHashMap<Boolean, List<SeatedTicketAssignment>> (since your collector is grouped by a boolean). Additionally, since the list utilized by the collector is an ArrayList, it should preserve the iteration order of the stream relative to the list.

edited Sep 05 '17 at 22:44

answered Aug 08 '17 at 01:00

Rogue

11,105
5
45
71

1

If the key type is `Boolean`, you can use `partitioningBy` and get a map that has an intrinsic order, however, where in the question does that key function appear? Since the order is that of the previous sorting (by the same value) step, removing the sort operation and collecting into a `TreeMap` would be much simpler… – Holger Aug 08 '17 at 11:40
4

Thaks for saving my time. – Sviatlana Jan 17 '18 at 15:11
@Holger, the list created by the third argument of `groupingBy` is supposed to preserve the entered order of the initial list? – Cristiano Aug 17 '20 at 23:31
@Cristiano It returns an `ArrayList`, so relative to whatever order the stream was in before for a sequential stream. – Rogue Aug 18 '20 at 00:48
1

@Cristiano the `toList()` collector preserves the order. But the final result is a `Map` containing up to two groups. When grouping into a `LinkedHashMap`, the map’s order depends on the first element. When grouping into a `TreeMap`, it will always be `false, true`. When using `partitioningBy(i -> i % 2 == 0)`, the order also will always be `false, true`, but this has not been documented. Further, when using `partitioningBy(i -> i % 2 == 0)` both keys will always exist even when not appearing in the stream (then, with an empty list). This missing information has been added to the doc in Java 9 – Holger Aug 18 '20 at 07:47

SergGr · Answer 2 · 2017-08-08T01:07:54.063

3

Unfortunately Stream API implementation is not aware of the fact, that the stream you pass is already sorted by what you need so "grouping" is actually trivial. Thus it uses the default way which is essentially similar to this SO answer i.e. create a Map for groups and fill it with elements of the stream. And by default the Map implementation that is used is the HashMap (see code here) which is good for performance reasons but bad for your goal because HashMap doesn't preserve the order of the keys than first sorting.

It might seems a bit unlucky that Group By is implemented in Stream API only as a "collector" so you can't first group and then sort in a one-liner. But this seems intentional: there is no way to implement Group By without fully materializing result so it can't be lazy and thus has to be a collector. @Rogue provided a nice trick with LinkedHashMap but to me it is to bound to implementation details. Still I would write a few more lines of code and first group and then sort the entries of the list (i.e. actually grouped HashMap) by key. Most probably it would be faster.

edited Aug 08 '17 at 01:07

answered Aug 08 '17 at 01:02

SergGr

23,570
2
30
51

2

Why sorting afterwards, when you can sort while grouping? `Collectors.groupingBy(SeatedTicketAssignment::getBlockIndex, TreeMap::new, Collectors.toList())`, if that’s not a one-liner, I don’t know… – Holger Aug 08 '17 at 11:44
@Holger, it really depends on the data. The issue is that `TreeMap` as any other sorting has `O(N*log(N))` complexity against `O(N)` for `HashMap`. If groupping just reduce number of elements just 2 of 3 time `TreeMap` would probably win, but if we group in something like `sqrt(N)` buckets, first grouping and then sorting will be faster. Another point is that IMHO this as `Rogue` answer relies on implementation details too much which is bad (Example: would you guarantee, that the order will always be preserved if later someone change the code to parallel streams?). Thus I prefer explicit code. – SergGr Aug 08 '17 at 14:02
3

Rogue’s answer does not rely on implementation details. Preserving the encounter order is guaranteed, regardless of whether you use a parallel stream or not. But his answer, using `LinkedHashMap`, still relies on a preceding sort operation, which not only has an `O(n log n)` worst case, but also needs a temporary `O(n)` storage operation before the collection. Plus the actual `collect` operation. So collecting directly into a `TreeMap` without a preceding sort operation will likely be faster. – Holger Aug 08 '17 at 14:28
1

But you’re right, when the number of groups is small, sorting after grouping into an ordinary `HashMap` may be even faster. So how about `Collectors.collectingAndThen(Collectors.groupingBy(SeatedTicketAssignment::getBlockIndex), TreeMap::new)`? Is that an appropriate one-liner? – Holger Aug 08 '17 at 14:30
@Holger, What I meant as a criticism for your `TreeMap` suggestion is that if someone wanted to really use parallel streams, he would need to use some version `groupingByConcurrent` which requires some implementation of `ConcurrentMap` and I'm not aware of any implementation that is both concurrent and "internally sorted" at the same time and I doubt one can create a performant version of such as ordering and concurrent seems to be inherently contradictory properties. – SergGr Aug 08 '17 at 16:01
As for @Rogue answer, I'd be interested in any place that claims that a `Collector` returned by `groupBy` when run over a any stream (and parallel stream in particular) is guranteed to preserve the order of the original stream (as the insertion order into the resulting Map) which is the property that the answser is based on. AFAIU, default implementation first create an independent Map for each thread and then merges thread-Maps into a result Map using bulk merge operations so I expect the order to be not preserved under many parallel stream partitioning strategies. – SergGr Aug 08 '17 at 16:02
1

you seem to have a wrong understanding of the `collect` operation. All Collectors are thread safe, hence, you don’t need to use a concurrent Collector. Further, all Collectors are guaranteed to be used in an order preserving manner, unless they explicitly allow an *unordered* operation by reporting an [“unordered”](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.Characteristics.html#UNORDERED) characteristic. All *unordered* Collectors are marked as such in [the documentation](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html)… – Holger Aug 08 '17 at 16:06
@Holger, Yes, I don't need to use conccurrent collector to get a correct result, but I need to use them to get the performance which is the main reason to use parallel streams in the first place. As for the fact that `groupingBy` collector is indeed guaranteed to be ordered - this is a thing I was not aware of so thank you. I'll need to dig into the implementation to find out how they guarantee that. – SergGr Aug 08 '17 at 16:12
4

As a fun fact, even if you don’t need a concurrent map here, if you are looking for a concurrent map that is intrinsically sorted, [`ConcurrentSkipListMap`](https://docs.oracle.com/javase/8/docs/api/?java/util/concurrent/ConcurrentSkipListMap.html) is there for you since Java 6… – Holger Aug 08 '17 at 16:12
@Holger, sorted `ConcurrentSkipListMap` is one more thing that I was not aware of, so thank you second time. But still `groupingByConcurrent` accortding to the doc is an unordered collector so it seems to not guarantee insertion order even for `ConcurrentSkipListMap`. – SergGr Aug 08 '17 at 16:17
4

`groupingByConcurrent` is an unordered collector, so it does not maintain the stream’s original encounter order, if there ever was one, but if you use `ConcurrentSkipListMap` as target, it will sort the inserted elements anyway and that sorted order is what you are interested in. – Holger May 28 '18 at 07:08

score 2 · Answer 3 · answered Aug 08 '17 at 01:18

2

Since the groupingBy collector does not require sorted input, you can sort the groups after collection. This will be faster than sorting items first, anyway, assuming that there are fewer groups than items:

availableSeats.stream()
        .collect(Collectors.groupingBy(SeatedTicketAssignment::getBlockIndex))
        .entrySet().stream()
        .sorted(Comparator.comparing(Map.Entry::getKey))
        .forEach(mapEntry -> {
             //Rest of the code
        }

answered Aug 08 '17 at 01:18

Matt Timmermans

53,709
3
46
87

1

You can also use [`Map.Entry.comparingByKey()`](https://docs.oracle.com/javase/8/docs/api/java/util/Map.Entry.html#comparingByKey--)… – Holger Aug 08 '17 at 14:33

Stream doesn't preserve the order after grouping

3 Answers3

Related