I have a data set like this:
name time val
---- ----- ---
fred 04:00 111
greg 03:00 123
fred 01:00 411
fred 05:00 921
fred 11:00 157
greg 12:00 333
And csv files in some folder, one for each unique name from the data set:
fred.csv
greg.csv
The contents of fred.csv, for example, looks like this:
00:00 222
10:00 133
My goal is to efficiently merge the dataset to the CSV's in sorted time order so that fred.csv, for example, ends up like this:
00:00 222
01:00 411
04:00 111
05:00 921
10:00 133
In reality, there are thousands of unique names, not just two. I use union and sort functions to add rows in order, but I have not been successful with partitionBy, for each, or coalesce in getting the rows to their proper CSV files.