Since f.current_date
creates a column object, I think it makes sense to start with a pyspark dataframe already in place – we can put [1,2,3]
into a column called upcoming_sunday_number
, and then use f.current_date
in the following way:
from pyspark.sql import functions as f
from pyspark.sql.types import IntegerType
# note: using f.dayofweek --> sunday is 1
N = 3
spark.createDataFrame(list(range(1,N+1)), "integer").toDF("upcoming_sunday_number").withColumn(
"current_date", f.current_date()
).withColumn(
"number_of_days", (1-f.dayofweek(f.current_date()) + 7*f.col('upcoming_sunday_number')).cast(IntegerType())
).withColumn(
"upcoming_sunday_date", f.date_add(f.current_date(), f.col("number_of_days"))
).show()
+----------------------+------------+--------------+--------------------+
|upcoming_sunday_number|current_date|number_of_days|upcoming_sunday_date|
+----------------------+------------+--------------+--------------------+
| 1| 2023-05-04| 3| 2023-05-07|
| 2| 2023-05-04| 10| 2023-05-14|
| 3| 2023-05-04| 17| 2023-05-21|
+----------------------+------------+--------------+--------------------+
Or more succinctly:
spark.createDataFrame(list(range(1,N+1)), "integer").toDF("upcoming_sunday_number").withColumn(
"upcoming_sunday_date", f.date_add(f.current_date(), (1-f.dayofweek(f.current_date()) + 7*f.col('upcoming_sunday_number')).cast(IntegerType()))
)
+----------------------+--------------------+
|upcoming_sunday_number|upcoming_sunday_date|
+----------------------+--------------------+
| 1| 2023-05-07|
| 2| 2023-05-14|
| 3| 2023-05-21|
+----------------------+--------------------+
Note that in my original answer, i was a little careless and made upcoming_sunday_number
a string
instead of an integer
, but this should still compile with the correct end result:
root
|-- upcoming_sunday_number: string (nullable = true)
|-- current_date: date (nullable = false)
|-- number_of_days: integer (nullable = true)
|-- upcoming_sunday_date: date (nullable = true)