0

Is there an pendant for this Pandas functionality in Pyspark?

pandasDataFrame.rolling('2s', min_periods=1).sum()

where the columns in question have timestamps like this

2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:05  3.0
:

(documentation here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html )

:

cs95
  • 379,657
  • 97
  • 704
  • 746
gilgamash
  • 862
  • 10
  • 31
  • 1
    http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.window – Steven Nov 27 '18 at 10:02
  • perfect, that is exactly what I needed. Thanks, Steven! Cannot mark it as the correct answer however, as it is just a comment... – gilgamash Nov 27 '18 at 10:49

1 Answers1

1

Use the window function in spark.

from pyspark.sql import functions as F
df.withColumn(
    "window",
    F.window("tmst", "2 secondes")
)
Steven
  • 14,048
  • 6
  • 38
  • 73