2

I have a timestamp column and I want to create a year column from this colum. I know how to show it but I am not able to create a column on my dataset. So far I have tried this:

data = data.withColumn('Year', data.select(year(('Date')))

But it throws an error saying:

AssertionError: col should be Column

I am able to show the year doing this:

data.select(year('Date').alias('Year')).show()
Nicolás Ozimica
  • 9,481
  • 5
  • 38
  • 51
Enterrador99
  • 121
  • 1
  • 13

2 Answers2

7

You need to give a column to year function. Following would work:

data = data.withColumn('Year', year(col('Date')))

SaiNageswar S
  • 1,203
  • 13
  • 22
1

Spark's data model can be a bit confusing.

Spark SQL functions and UDFs operate on "Column" objects. A Column in Spark is a placeholder for the column in the actual table. Some methods like .select() let you use strings as shortcuts, e.g. df.select('year') is equivalent to df.select(pyspark.sql.functions.col('year')).

So the first answer is correct because, instead of using a string, it correctly uses the column placeholder.

(This behavior is very poorly documented in my opinion.)

shadowtalker
  • 12,529
  • 3
  • 53
  • 96