to_timestamp() in pyspark giving null values

Question

I am trying the following simple transformation.

data = [["06/15/2020 14:04:04]]
cols = ["date"]

df = spark.createDataFrame(data,cols)

df = df.withColumn("datetime",F.to_timestamp(F.col("date"),'MM/DD/YYYY HH24:MI:SS))
df.show()

But this gives me an error "All week-based patterns are unsupported since Spark 3.0, detected: Y, Please use the SQL function EXTRACT instead"

I want to format the data into that date format and convert it to timestamp.

I want the date to be in 24 hours format. Let me know the correct way if this is wrong way of defining. — Amaravathi Satya, Feb 09 '23 at 08:17
Does this answer your question? [Better way to convert a string field into timestamp in Spark](https://stackoverflow.com/questions/29844144/better-way-to-convert-a-string-field-into-timestamp-in-spark) — Lamanus, Feb 09 '23 at 08:59

score 0 · Answer 1 · answered Feb 09 '23 at 08:32

You should use this format - MM/dd/yyyy HH:mm:ss'

Check this page for all datetime format related details.

df = df.withColumn("datetime",to_timestamp(col("date"),'MM/dd/yyyy HH:mm:ss'))
df.show()

+-------------------+-------------------+
|               date|           datetime|
+-------------------+-------------------+
|06/15/2020 14:04:04|2020-06-15 14:04:04|
+-------------------+-------------------+

score 0 · Answer 2 · answered Feb 09 '23 at 08:40

The different elements of the timestamp pattern are explained in Spark's documentation. Note that Spark parses timestamps utilising Java's SimpleTimeFormat which uses a somewhat confusing set of format symbols. The symbol matching the hour in 24hrs representation is simply H with no digital suffixes. The minutes are m and not M which is for the month. The year is matched by y and not by Y which is for week year. Week years are unsupported hence the message you're getting.

In your case, the proper format should be MM/dd/yyyy HH:mm:ss.

to_timestamp() in pyspark giving null values

2 Answers2