Equivalent of collection.groupBy in scalaz-streams

Asked May 10 '15 at 13:58

Active May 10 '15 at 13:58

Viewed 152 times

I have a folder which contain multiple files with names such as filetype1_ddMMyyyy_hhmm, filetype2_ddMMyyyy_hhmm

Per each day, there could be multiple files with a different hour and I would need to parse only the one with the highest hour. In a non-reactive stream world, the algorithm can be implemented as a groupBy date, what's its equivalent in scalaz-stream?

asked May 10 '15 at 13:58

Edmondo

19,559
13
62
115

Are the files sorted by date? If not, there is no way of finding the highest hour until the stream halts, so you may as well do this after consuming it. – Pyetras May 10 '15 at 18:43
I am aware that it will be necessary to traverse all the stream, though I would need to get for each ddMMyyyy__hhmm the filename with the highest hhmm. In SQL that would be such as SELECT MAX(dateTime) group by Date(dateTime) or something similar, are you suggesting to consume the stream and use the collection API, given that anyways I need to consume the stream? – Edmondo May 10 '15 at 21:59
1

You cannot emit any partial results while still consuming the stream so why not? If memory is a concern you could do a `scan` with a dictionary, where Date(dateTime) from your example are keys and entries with the currently biggest seen hour are values – Pyetras May 10 '15 at 23:01

Equivalent of collection.groupBy in scalaz-streams

0 Answers0