I am trying to implement something similar to the below SparkR code into pyspark.
df <- createDataFrame(mtcars)
# Partition by am (transmission) and order by hp (horsepower)
ws <- orderBy(windowPartitionBy("am"), "hp")
# Lag mpg values by 1 row on the partition-and-ordered table
out <- select(df, over(lag(df$mpg), ws), df$mpg, df$hp, df$am)
Does anyone have any idea how to do this on pyspark dataframe?