Pyspark pendant of Pandas' Rolling given time interval

Question

Is there an pendant for this Pandas functionality in Pyspark?

pandasDataFrame.rolling('2s', min_periods=1).sum()

where the columns in question have timestamps like this

2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:05  3.0
:

:

http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.window — Steven, Nov 27 '18 at 10:02
perfect, that is exactly what I needed. Thanks, Steven! Cannot mark it as the correct answer however, as it is just a comment... — gilgamash, Nov 27 '18 at 10:49

score 1 · Accepted Answer · answered Nov 27 '18 at 10:57

1

Use the window function in spark.

from pyspark.sql import functions as F
df.withColumn(
    "window",
    F.window("tmst", "2 secondes")
)

answered Nov 27 '18 at 10:57

Steven

1 Answers1