3

I am trying to convert and reformat a date column stored as a string using spark sql from something that looks like this...

30/03/20 02:00

to something that is a datetime column and looks like this...

2020-03-30 02:00 ('YYYY-MM-dd HH:mm')

I am not using python but simply writing sql in DBeaver directly to a spark datalake. Any help is greatly appreciated.

blackbishop
  • 30,945
  • 11
  • 55
  • 76
bbal20
  • 113
  • 4
  • 11

1 Answers1

4

Use to_timestamp to parse string date into timestamp column and date_format to format it to desired pattern:

select date_format(to_timestamp(sting_date, 'dd/MM/yy HH:mm'), 'yyyy-MM-dd HH:mm') as date 
blackbishop
  • 30,945
  • 11
  • 55
  • 76
  • That worked thank you so much. Would you have a solution if I wanted to add the 4-digit year instead of just the 2-digit year? The original source only has '20' so I would imagine I would need to inset '20' in the string before the preceding 2-digit year to return a 4-digit year right? – bbal20 Feb 04 '21 at 23:20
  • yes that is correct. No sure if I need to somehow add '20' in the original string before converting through date_format. Otherwise, the solution is returning a 2-digit year. – bbal20 Feb 04 '21 at 23:31
  • You need just to change the date pattern. yy means 2-digit year and yyyy for 4-digit year. Please see [datetime patterns](https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html) from the docs. – blackbishop Feb 04 '21 at 23:35