0

I have a date column with string datatype when inferred in pyspark:

Mon Oct 17 15:57:48 EST 2022

How to cast string datatype as datetime?

Anos
  • 57
  • 8

1 Answers1

1

you can use the required datetime formatters - 'E MMM dd HH:mm:ss z yyyy'. the resulting timestamp will be in UTC and, thus, you'll see that it will add 5 hours to the source ts.

spark.conf.set('spark.sql.legacy.timeParserPolicy', 'LEGACY')

spark.sparkContext.parallelize([('Mon Oct 17 15:57:48 EST 2022', )]).toDF(['dt_str']). \
    withColumn('dt', func.to_timestamp('dt_str', 'E MMM dd HH:mm:ss z yyyy')). \
    show(truncate=False)

# +----------------------------+-------------------+
# |dt_str                      |dt                 |
# +----------------------------+-------------------+
# |Mon Oct 17 15:57:48 EST 2022|2022-10-17 20:57:48|
# +----------------------------+-------------------+
samkart
  • 6,007
  • 2
  • 14
  • 29
  • Thanks for your input!. I'm trying to convert to datetime instead of timestamp. Is it possible? Because when I try to convert, it is showing as not supported. – Anos Oct 26 '22 at 17:39
  • @Anos - datetime means timestamp. do you mean just the date part? use the `to_date` function instead of `to_timestamp`. – samkart Oct 26 '22 at 20:08
  • Sorry for the confusion. I thought we have separate datetime datatype conversion. So, I get to know datetime ideally means timestamp datatype. Thanks! – Anos Oct 27 '22 at 07:45
  • When I try to convert this Wed Oct 19 00:15:13 EST 2022, the resulting timestamp is 2022-10-19 10:45:13. Not really getting what is happening. It is not adding +5 – Anos Oct 27 '22 at 08:08
  • 1
    @Anos - it must be because your default tz is set to be something other than UTC, and spark is adding the offset based on that tz. see [this](https://stackoverflow.com/q/49644232/8279585) to reset your session's tz – samkart Oct 27 '22 at 08:12