In order to aggregate a timeserie (for ex every 10min) I used "groupBy" and "window" as shown :
val df2 = df.groupBy(
window($"timestamp", "10 minutes"))
.avg("field")
df2.show() looks like
+-------------------------------------------+----------+
| window|avg(field)|
+-------------------------------------------+----------+
| [2018-06-10 03:30:00, 2018-06-10 03:40:00]|22 |
| [2018-06-10 03:30:00, 2018-06-10 03:40:00]|42 |
| [2018-06-10 03:30:00, 2018-06-10 03:40:00]|60 |
+-------------------------------------------+----------+
This is its schema :
root
|-- window: struct (nullable = true)
| |-- start: timestamp (nullable = true)
| |-- end: timestamp (nullable = true)
|-- avg(field): int (nullable = true)
I wanted to save it to csv but I can't :
CSV data source does not support struct<start:timestamp,end:timestamp>
Do you know how can I flatten the window column ? Or is there a better way to aggregate timeseries like that ?
Thank you very much