Using Spark 3.1, I am trying to convert string type value ("MM/dd/yyyy") in into date format ("dd-MM-yyyy"). There is a total of 5 date columns in my file and I want to change them into proper date format ("dd-MM-yy") from ("MM/dd/yy"). There are some values in these columns which are already in date format like 05-02-2022 ("dd-MM-YYYY") and some values in 10/23/2021 ("MM-dd-yyyy") format. I want to convert only those values which are in "MM-dd-YYYY" format to "dd-MM-yyyy" format. How can I achieve this?
Input:
df = pd.DataFrame([[10/23/2019, 09/13/2021], [06/16/2020, 03/16/2021], [09/06/2022, 12/23/2019], columns=['A', 'B'])
Output will be like 23-10-2019, 13-09-2021
My code:
df = df.withColumn('date_col', to_date('Date_col', 'dd-MM-yy'))
The code is running fine, but it's returning undefined in output for date column. As I have 5 date columns, is it possible to do it using a for
loop?