I am trying to convert datetime strings with timezone to timestamp using to_timestamp.
Sample dataframe:
df = spark.createDataFrame([("a", '2020-09-08 14:00:00.917+02:00'),
("b", '2020-09-08 14:00:00.900+01:00')],
["Col1", "date_time"])
My attempt (with timezone specifier Z):
df = df.withColumn("timestamp",f.to_timestamp(df.date_time, "yyyy-MM-dd HH:mm:ss.SSSZ"))
df.select('timestamp').show()
Actual result:
+---------+
|timestamp|
+---------+
| null|
| null|
+---------+
Wanted result (where timestamp is of type timestamp):
+-------------------------+
| timestamp|
+-------------------------+
|2020-09-08 14:00:00+02:00|
|2020-09-08 14:00:00+01:00|
+-------------------------+
I have tried many other versions of format as well, but I cannot seem to find the right one.