0

I have a fixed date "2000/01/01" and a dataframe:

data1 = [{'index':1,'offset':50}]
data_p = sc.parallelize(data1)
df = spark.createDataFrame(data_p)

I want to create a new column by adding the offset column to this fixed date

I tried different method but cannot pass the column iterator and expr error as: function is neither a registered temporary function nor a permanent function registered in the database 'default'

The only solution I can think of is

df = df.withColumn("zero",lit(datetime.strptime('2000/01/01', '%Y/%m/%d')))
df.withColumn("date_offset",expr("date_add(zero,offset)")).drop("zero")

Since I cannot use lit and datetime.strptime in the expr, I have to use this approach which creates a redundant column and redundant operations.

Any better way to do it?

NewPy
  • 623
  • 1
  • 5
  • 14

1 Answers1

1

As you have marked it as pyspark question so in python you can do below

df_a3.withColumn("date_offset",F.lit("2000-01-01").cast("date") + F.col("offset").cast("int")).show()

Edit- As per comment below lets assume there was an extra column of type then based on it below code can be used

df_a3.withColumn("date_offset",F.expr("case when type ='month' then add_months(cast('2000-01-01' as date),offset) else date_add(cast('2000-01-01' as date),cast(offset as int)) end ")).show()
Anjaneya Tripathi
  • 1,191
  • 1
  • 3
  • 8