I would like to generate the below dataframe
Here, I am calculating the "adstock" based on the column "col_lag" and an engagement factor 0.9 as below:
# window
windowSpec = Window.partitionBy("id").orderBy("dt")
# create the column if it does not exist
if ('adstock' not in df.columns):
df = df.withColumn("adstock",lit(0))
df = df.withColumn("adstock", (col('col_lag') + (lit(0.9)*(lag("adstock", 1).over(windowSpec)))))
When I run the above, somehow the code does not generate values after two or three rows and gives something like below:
I have around 125000 Ids and weekly data from 2020-01-24 to current week. I tried various methods like rowsBetween(Window.unboundedPreceding, 1)
or creation of another column etc., but have not been successful.
I would appreciate any suggestions in this regard.