0

I have a dataframe in which some of the dates are indented as below:

1      20/09/2021 

5        11/9/2021

7         8/9/2021

8         7/9/2021

12     26/08/2021 
          ...     
813    28/08/2020 
814    28/08/2020 
815    26/08/2020 
816    26/08/2020 
817    23/08/2020 

while trying to pass pd.to_datetime function, the results comes in different date format as below:

1     2021-09-20>>>>> This is the right format that I want (yyyy-mm-dd)

5     2021-11-09 >>>>>Here the format has changed(yyyy-dd-mm)

7     2021-08-09

8     2021-07-09 >>>> 
12    2021-08-26
         ...    
813   2020-08-28
814   2020-08-28
815   2020-08-26
816   2020-08-26
817   2020-08-23

If someone could help me with this one please?

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • Notice that the dates 5, 7 and 8 don't have trailing zeros. That is most probably messing up the formatting. Can you add the zeros yourself? If so I would try it. – Shinra tensei Sep 30 '21 at 10:15

1 Answers1

0

The issue is due the missing 0 padding for the month (dd/m/yyyy) in the CSV file.

Since you only have 800 columns in your data, you can simply add 0s wherever missing and then call to_datetime

def fix_df(x)):
    return pd.to_datetime("/".join([i.zfill(2) for i in x.split("/")]),format="%d/%m/%Y")

and apply it with

df.applymap(f)
kinshukdua
  • 1,944
  • 1
  • 5
  • 15
  • pd.to_datetime with format="%d/%m/%Y" or dayfirst=True (see dupe I linked) should work perfectly fine (zero-padding not required). No need for an apply here as well. Ex: try `pd.to_datetime(["8/9/2021", "08/09/2021"], dayfirst=True)`. – FObersteiner Sep 30 '21 at 10:31