Spark >= 2.0
You can use window
function
from pyspark.sql.functions import window
(df
.groupBy(window("timestamp", "3 minute").alias("ts"))
.sum()
.orderBy("ts")
.show())
## +--------------------+---------+
## | ts|sum(data)|
## +--------------------+---------+
## |{2000-01-01 00:00...| 3|
## |{2000-01-01 00:03...| 12|
## |{2000-01-01 00:06...| 21|
## +--------------------+---------+
(df
.groupBy(window("timestamp", "3 minute").alias("ts"))
.sum()
.orderBy("ts")
.show())
## +--------------------+---------+
## | ts|sum(data)|
## +--------------------+---------+
## |{2000-01-01 00:00...| 36|
## +--------------------+---------+
Spark < 2.0
In this particular case all you need is Unix timestamps and basic arithmetics:
from pyspark.sql.functions import timestamp_seconds
def resample_to_minute(c, interval=1):
t = 60 * interval
# For Spark < 3.1
# return (floor(c / t) * t).cast("timestamp")
return timestamp_seconds(floor(c / t) * t)
def resample_to_hour(c, interval=1):
return resample_to_minute(c, 60 * interval)
df = sc.parallelize([
("2000-01-01 00:00:00", 0), ("2000-01-01 00:01:00", 1),
("2000-01-01 00:02:00", 2), ("2000-01-01 00:03:00", 3),
("2000-01-01 00:04:00", 4), ("2000-01-01 00:05:00", 5),
("2000-01-01 00:06:00", 6), ("2000-01-01 00:07:00", 7),
("2000-01-01 00:08:00", 8)
]).toDF(["timestamp", "data"])
(df.groupBy(resample_to_minute(unix_timestamp("timestamp"), 3).alias("ts"))
.sum().orderBy("ts").show(3, False))
## +---------------------+---------+
## |ts |sum(data)|
## +---------------------+---------+
## |2000-01-01 00:00:00.0|3 |
## |2000-01-01 00:03:00.0|12 |
## |2000-01-01 00:06:00.0|21 |
## +---------------------+---------+
(df.groupBy(resample_to_hour(unix_timestamp("timestamp")).alias("ts"))
.sum().orderBy("ts").show(3, False))
## +---------------------+---------+
## |ts |sum(data)|
## +---------------------+---------+
## |2000-01-01 00:00:00.0|36 |
## +---------------------+---------+
Example data from pandas.DataFrame.resample
documentation.
In general case see Making histogram with Spark DataFrame column