0

I have a dataframe with timestamp in the following format "yyyy-MM-dd HH:mm:ss.SSSSSSS" I want to trim the milliseconds and nanoseconds from the given string and convert that into datetime type.

I tried using the to_timestamp() method to convert from string to timestamp format, I am successful in that but I am getting the milliseconds and Nanoseconds at the end.

I tried following to remove milliseconds but none of them worked.

  1. I tried date Trucate method to remove the milliseconds it worked but it converts the column to string format.
  2. I tried with:
  to_timestamp($"column_name", "YYYY-mm-dd HH:MM:ss")

but I am getting the default format as output. This method did not recoganize my custom date time format. Default format I got is --> "YYYY-mm-ddTHH:MM:ss.sssss+sss"

.withColumn("datetype_timestamp",
          to_timestamp(col("RunStartTime"),"YYYY-mm-dd HH:MM:ss")
           )

Above is my code sample, can someone suggest what I should do here please? Thank you for your time :)

Cluster details: 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)

AminMal
  • 3,070
  • 2
  • 6
  • 15

1 Answers1

0

I don't know if this is the best/most elegant approach to this, but I could use a combination of to_timestamp and date_format to achieve this:

.withColumn(
  "datetype_timestamp",
  to_timestamp(date_format(col("input_timestamp"), "yyyy-mm-dd HH:MM:ss"))
// input_timestamp would be RunStartTime in your case
)

And this was the output:

+---------------------------+-------------------+
|input_timestamp            |datetype_timestamp |
+---------------------------+-------------------+
|2022-02-12 12:12:12.4398715|2022-12-12 12:02:12|
+---------------------------+-------------------+
AminMal
  • 3,070
  • 2
  • 6
  • 15
  • Thank you, This is my issue I tried your once yesterday before coming to stackoverflow but for some reason, you are not getting the Millisecond precision while I am getting this as output 2022-01-25T00:05:00.000+0000. Is it something to do with librabry I am using? Not sure what is different with my code. – surya prakash May 31 '22 at 08:15
  • @suryaprakash Hmm, seems weird, what's your spark version? This could be related to the spark version but that would be so weird. – AminMal May 31 '22 at 08:16
  • This is my cluster configuration - 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12) – surya prakash May 31 '22 at 08:22
  • If I apply the datetime format up to seconds, the values after seconds are replaced with 0's. I want to completely trim down the value. I don't even want zeros I am tried a few methods all give the 0's at the end. – surya prakash May 31 '22 at 08:44