pyspark add int column to a fixed date

Question

I have a fixed date "2000/01/01" and a dataframe:

data1 = [{'index':1,'offset':50}]
data_p = sc.parallelize(data1)
df = spark.createDataFrame(data_p)

I want to create a new column by adding the offset column to this fixed date

I tried different method but cannot pass the column iterator and expr error as: function is neither a registered temporary function nor a permanent function registered in the database 'default'

The only solution I can think of is

df = df.withColumn("zero",lit(datetime.strptime('2000/01/01', '%Y/%m/%d')))
df.withColumn("date_offset",expr("date_add(zero,offset)")).drop("zero")

Since I cannot use lit and datetime.strptime in the expr, I have to use this approach which creates a redundant column and redundant operations.

Any better way to do it?

Anjaneya Tripathi · Accepted Answer · 2022-06-21T04:39:02.400

1

As you have marked it as pyspark question so in python you can do below

df_a3.withColumn("date_offset",F.lit("2000-01-01").cast("date") + F.col("offset").cast("int")).show()

Edit- As per comment below lets assume there was an extra column of type then based on it below code can be used

df_a3.withColumn("date_offset",F.expr("case when type ='month' then add_months(cast('2000-01-01' as date),offset) else date_add(cast('2000-01-01' as date),cast(offset as int)) end ")).show()

edited Jun 21 '22 at 04:39

answered Jun 20 '22 at 17:44

Anjaneya Tripathi

1,191
1
3
8

Thank you. What happens if this offset is month offset? – NewPy Jun 21 '22 at 03:51
have added another code where the offset is added as a month – Anjaneya Tripathi Jun 22 '22 at 16:53

pyspark add int column to a fixed date

1 Answers1