0

I have a string column that has unix_tstamp in a pyspark dataframe.

unix_tstamp       utc_stamp

1547741586462     2019-01-17 16:13:06:462
1547741586562     2019-01-17 16:13:06:562
1547741586662     2019-01-17 16:13:06:662
1547741586762     2019-01-17 16:13:06:762
1547741586862     2019-01-17 16:13:06:862

I exactly want to perform conversion in the above format but I'm getting null when I tried the below method,

data.withColumn("utc_stamp", unix_timestamp('unix_tstamp',"yyyy-MM-dd'T'HH:mm:ss.SSSZ"))

Am I missing something or is there any other way?

  • Your pattern doesn't match. There's no `'T'` in your strings and there shouldn't be a `Z` at the end. Try `"yyyy-MM-dd HH:mm:ss.SSS"` – pault Jan 24 '20 at 16:44
  • It is still showing null in the new column even after trying your format. – Saketh Garuda Jan 24 '20 at 17:03
  • That last period should be a colon. You have to make your your strings match the pattern. `null` means the pattern is wrong. Do a search for Java simple date format – pault Jan 24 '20 at 17:15

1 Answers1

1

You can specify the format like this:

df = df.withColumn('utc_stamp', F.from_unixtime('Timestamp', format="YYYY-MM-dd HH:mm:ss.SSS"))

df.show(truncate=False)

+----------+-----------------------+
|Timestamp |utc_stamp              |
+----------+-----------------------+
|1579887004|2020-01-24 18:30:04.000|
|1579887004|2020-01-24 18:30:04.000|
+----------+-----------------------+

Sample Data

# today's datestamp
d = [[1579887004],
     [1579887004],
    ]

df = spark.createDataFrame(d, ['Timestamp'])
YOLO
  • 20,181
  • 5
  • 20
  • 40