I have a dataframe like this:
| id | prodId | date | value |
| 1 | a | 2015-01-01 | 100 |
| 2 | a | 2015-01-02 | 150 |
| 3 | a | 2015-01-03 | 120 |
| 4 | b | 2015-01-01 | 100 |
and I would love to do a groupBy prodId and aggregate 'value' summing it for ranges of dates. In other words, I need to build a table with the following columns:
- prodId
- val_1: sum value if date is between date1 and date2
- val_2: sum value if date is between date2 and date3
val_3: same as before etc.
| prodId | val_1 | val_2 | | | (01-01 to 01-02) | (01-03 to 01-04) | | a | 250 | 120 | | b | 100 | 0 |
Is there any predefined aggregated function in spark that allows doing conditional sums? Do you recommend develop a aggr. UDF (if so, any suggestions)? Thanks a lot!