I'd like to run a window function in Spark SQL. I have Zeppelin sitting on top of a Spark cluster with Hadoop.
I'd like to add a row number to a table and group it by a combination of two IDs.
This is my data.
food aisle item date_added
pear 1 1234 '2020-01-01 10:12'
banana 2. 1233 '2020-01-02 10:12'
banana 2 1211 '2020-01-03 10:12'
banana 2 1412 '2020-01-04 10:12'
apple 1 1452 '2020-01-05 10:12'
apple 1 1334 '2020-01-06 10:12'
I'd like to turn the data into this
food aisle item date_added rn
pear 1 1234 '2020-01-01 10:12' 1
banana 2 1233 '2020-01-02 10:12' 3
banana 2 1211 '2020-01-03 10:12' 2
banana 2 1412 '2020-01-04 10:12' 1
apple 1 1452 '2020-01-05 10:12' 2
apple 1 1334 '2020-01-06 10:12' 1
This is my query
%sql
select
food,
aisle,
item,
row_number() over (order by date_added desc
partition by food, aisle
rows between unbounded preceeding and current row) as rn
from fruits
This is the error
mismatched input 'partition' expecting {')', ',', 'RANGE', 'ROWS'}(line 5, pos 28)
How do I solve this error with Spark SQL?