5

I've got a java.util.stream.Stream containing key value pairs like:

<1,3> <1,5> <3,1> <4,2> <4,7> <4,8>

Now I would like to merge all entries, which have got the same key:

<1,[3,5]>  <3,[1]> <4,[2,7,8]>

The data is already sorted, so only consecutive datasets have to be merged.

Now I'm searching for a way to transform the the content of the stream like above, without loading all datasets into memory.

I'd prefer to get a java.util.stream.Stream as result with a different object type containing a list of values instead of a single value.

My only approach is a custom iterator, which performs the merge, but it seems to be pretty ugly to convert to an iterator and back to stream.

What is the best approach for it?

fps
  • 33,623
  • 8
  • 55
  • 110
Spille
  • 527
  • 4
  • 12
  • 6
    You've already found what's probably the best available option. Stream isn't really meant for the sort of operations you want. – Louis Wasserman Jun 05 '17 at 23:49
  • I would think a `.groupBy()` operation might work depending upon exactly what is in the stream. @LouisWasserman might have understood your requirements better, however. – KevinO Jun 05 '17 at 23:51
  • @KevinO, the OP specifically said that they didn't want the a Stream as a result, and to avoid loading the data into memory. `groupingBy` won't let you do those. – Louis Wasserman Jun 05 '17 at 23:51
  • @LouisWasserman, which is why I always defer to the master. – KevinO Jun 05 '17 at 23:52
  • What does *"without loading all datasets into memory"* mean? What is your input? Reading a file? Querying a database? Streaming over web? Anyway, you write your own [`Spliterator`](https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html) that retrieves the input, collects the values for the next key, then supply that on the next `tryAdvance()` call. However, note that a `Spliterator` is even more ugly than an `Iterator`, so if uglyness is your litmus-test, you'll just stick to the `Iterator`. – Andreas Jun 06 '17 at 00:05
  • I'm getting the data from a database. I don't want them all in memory at the same time, because of the amount. – Spille Jun 06 '17 at 00:36
  • Maybe a custom collector, backed by a `MultiMap` / `Map>`? – Adrian Shum Jun 06 '17 at 02:06
  • Hmm, you are getting data from database, want to group by but at the same time don't want to load everything into memory? Seems like you need to wrap Java streams around your result set. Or maybe you need to do group by right in database? – tsolakp Jun 06 '17 at 03:25
  • Maybe this post showing how to wrap result set with stream can help you: https://stackoverflow.com/questions/32209248/java-util-stream-with-resultset – tsolakp Jun 06 '17 at 03:31
  • 3
    @tsolakp if my understanding is correct: OP already has a way to make its db result set into stream. He just don't want to have all result set loaded into memory to get the result (a naive use of `stream.groupBy()` will give this kind of nasty effect) – Adrian Shum Jun 06 '17 at 04:10
  • 1
    @Andreas: a `Spliterator` is *not* uglier than an `Iterator`. It’s simpler. – Holger Jun 06 '17 at 11:15

1 Answers1

4

Here is the solution by SteamEx.

int[][] datasets = { { 1, 3 }, { 1, 5 }, { 3, 1 }, { 4, 2 }, { 4, 7 }, { 4, 8 } };

StreamEx.of(datasets) //
        .collapse((a, b) -> a[0] == b[0], groupingBy(a -> a[0], mapping(a -> a[1], toList()))) //
        .forEach(System.out::println);

you can replace int[] with your dataset object. We can add peek to verify if it's lazy loading/calculation:

StreamEx.of(datasets) //
        .peek(System.out::println) //
        .collapse((a, b) -> a[0] == b[0], groupingBy(a -> a[0], mapping(a -> a[1], toList()))) //
        .limit(1) //
        .forEach(System.out::println);
user_3380739
  • 1
  • 14
  • 14