7

I am trying to convert datetime strings with timezone to timestamp using to_timestamp.

Sample dataframe:

df = spark.createDataFrame([("a", '2020-09-08 14:00:00.917+02:00'), 
                            ("b", '2020-09-08 14:00:00.900+01:00')], 
                           ["Col1", "date_time"])

My attempt (with timezone specifier Z):

df = df.withColumn("timestamp",f.to_timestamp(df.date_time, "yyyy-MM-dd HH:mm:ss.SSSZ"))
df.select('timestamp').show()

Actual result:

    +---------+
    |timestamp|
    +---------+
    |     null|
    |     null|
    +---------+

Wanted result (where timestamp is of type timestamp):

+-------------------------+
|                timestamp|
+-------------------------+
|2020-09-08 14:00:00+02:00|
|2020-09-08 14:00:00+01:00|
+-------------------------+

I have tried many other versions of format as well, but I cannot seem to find the right one.

Christian Sloper
  • 7,440
  • 3
  • 15
  • 28

1 Answers1

8

As far as I know, it is not possible to parse the timestamp with timezone and retain its original form directly.

The issue is that to_timestamp() & date_format() functions automatically converts them to local machine's timezone.

I can suggest you to parse the timestamps and convert them into UTC as follows,

df.withColumn('local_ts', date_format(df.date_time, "yyyy-MM-dd HH:mm:ss.SSSX")) \
  .withColumn("timestamp_utc",to_utc_timestamp(to_timestamp(df.date_time, "yyyy-MM-dd HH:mm:ss.SSSX"), 'America/New_York')) \
  .show(10, False) 

# America/New_York is machine's timezone

+----+-----------------------------+--------------------------+-----------------------+
|Col1|date_time                    |local_ts                  |timestamp_utc          |
+----+-----------------------------+--------------------------+-----------------------+
|a   |2020-09-08 14:00:00.917+02:00|2020-09-08 08:00:00.917-04|2020-09-08 12:00:00.917|
|b   |2020-09-08 14:00:00.900+01:00|2020-09-08 09:00:00.900-04|2020-09-08 13:00:00.9  |
+----+-----------------------------+--------------------------+-----------------------+

If you still prefer to retain in its original form, then I guess you suppos to write a custom udf for that.

Dharman
  • 30,962
  • 25
  • 85
  • 135
suresiva
  • 3,126
  • 1
  • 17
  • 23