1

Input:

item    loc    qty    year    month
A       IND     10     2019      13
A       IND     20     2020      1
A       IND     10     2020      2
A       IND     40     2020      3
A       IND     50     2020      5
A       IND     10     2020      6

OUTPUT:

item   loc  sum(qty)   year   month
A       IND   0      2019    13
A       IND   10     2020    1
A       IND   30     2020    2
A       IND   40     2020    3
A       IND   50     2020    5
A       IND   90     2020    6

description:

how will i get my output is as follows:

if i am calculationg for year 2020 and month 3 then i need to consider the sum(qty) between (month-3) and (month-1) i.e. in this case it will be from year 2019 month 12 to year 2020 and month 2

so for year 2020 and month 3 the ouput will be sum(qty)=10+20+10=40

now for year 2020 and month 6

sum(qty) will be between year 2020 and month -3=3 and year 2020 and month-1=5

so sum(qty)=0(0 for month 4 which is not in the table)+40+50=90

  • Why are you taging mysql????? – Lamanus Aug 21 '20 at 07:37
  • Does this answer your question? [Cumulative sum in Spark](https://stackoverflow.com/questions/47874067/cumulative-sum-in-spark) – Lamanus Aug 21 '20 at 07:37
  • @Lamanus i have edited the qn against the year 2019 i have month 13 and not 12..... youe solution will perfectly fine for other months but not for year 2020 and month 1 can you plz check? – Mradul Gupta Aug 21 '20 at 09:00

1 Answers1

1

Try this.

df.createOrReplaceTempView("test")

spark.sql("""
    SELECT 
        item,
        loc,
        COALESCE(
            SUM(qty) OVER (
                PARTITION BY item 
                ORDER BY (year - 2000) * 13 + month
                RANGE BETWEEN 3 PRECEDING AND 1 PRECEDING
            ), 0) as sum_qty,
        year,
        month
    FROM 
        test
""").show

+----+---+-------+----+-----+
|item|loc|sum_qty|year|month|
+----+---+-------+----+-----+
|   A|IND|      0|2019|   13|
|   A|IND|     10|2020|    1|
|   A|IND|     30|2020|    2|
|   A|IND|     40|2020|    3|
|   A|IND|     50|2020|    5|
|   A|IND|     90|2020|    6|
+----+---+-------+----+-----+
Lamanus
  • 12,898
  • 4
  • 21
  • 47
  • against year 2019 i have month 13.its causing an issue and throughing 0 sum_qty in month 1 and year 2020 also – Mradul Gupta Aug 21 '20 at 08:58
  • Then change 12 to 13 everywhere. – Lamanus Aug 21 '20 at 09:03
  • That is in data itself the issue is (year - 2000) * 12 + month for year 2019 and month 13 (2019-2000)*12+13=241 also year 2020 and month 1 (2020-2000)*12+1=241 so both will give me same results....that is wrong – Mradul Gupta Aug 21 '20 at 09:07
  • yeap, change it to 13 also. – Lamanus Aug 21 '20 at 09:08
  • 1
    I never expect there was month 13, so you should create your own months that should be continuous int type. Then this method will work well. – Lamanus Aug 21 '20 at 09:50