0

I am new to PySpark and I am trying to create a function that can be used across when inputted a column from String type to a timestampType.

This the input column string looks like: 23/04/2021 12:00:00 AM

I want this to be turned in to timestampType so I can get latest date using pyspark.

Below is the function I so far created:

def datetype_change(self, key, col):
  self.log.info("datetype_change...".format(self.app_name.upper()))
  self.df[key] = self.df[key].withColumn("column_name", F.unix_timestamp(F.col("column_name"), 'yyyy-MM-dd HH:mm:ss').cast(TimestampType()))

When I run it I'm getting an error:

NameError: name 'TimestampType' is not defined

How do I change this function so it can take the intended output?

TylerP
  • 9,600
  • 4
  • 39
  • 43
YJG
  • 123
  • 2
  • 12
  • have you imported the `TimestampType` class? – AdibP Sep 27 '21 at 10:43
  • Thanks for the tip and I did imported it before. But actually was missing was: from pyspark.sql.types import * So I ported that – YJG Sep 28 '21 at 07:21

1 Answers1

0

Found my answer:

  def datetype_change(self,key,col):
  self.log.info("-datetype_change...".format(self.app_name.upper()))

  self.df[key] = self.df[key].withColumn(col, F.unix_timestamp(self.df[key][col], 'dd/MM/yyyy hh:mm:ss aa').cast(TimestampType()))
YJG
  • 123
  • 2
  • 12
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 28 '21 at 07:31