0

I am using a dataframe which has a 'Date' column. I have used pd.to_datetime() to convert this column format to yyyy-mm-dd. However, this format is getting switched to some other format at intermittent dates in the dataframe (eg: yyyy-dd-mm).

Date 
2021-02-01 <----- this is 2nd Jan, 2021
2021-01-21 <----- this is 21st Jan, 2021

Further, I have alto tried using the df['Date'].dt.strftime('%y-%m-%d'), but this too has not helped.

I request some guidance on the following points:

  1. For any Date column, is it enough to just use pd.to_datetime() and be rest assured that all dates will be in correct format?
  2. Or do I need to manually state the datetime format explicitly alongwith the pd.to_[enter image description here][1]datetime() feature?
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
AA83
  • 3
  • 1
  • 1
    Hello @MrFuppes, this works perfectly !. Thank you so much !. My program's error has got resolved. Really appreciate your help ! – AA83 Sep 27 '21 at 10:25
  • Just one small question: From model stability perspective, in case I have 3 different dataframes', and each have a Date column in them, is it a best practice to use the "dayFirst=True" property? . Is it advisable to use it always from a model stability perspective? – AA83 Sep 27 '21 at 10:28
  • This depends on your input. If you get that from a region of the world where it is common to put the day first in a date, then you can set it to True in general. But be aware there are regions where this is not the case, e.g. the US. If you want to be 100% sure and receive an error for invalid dates, set a specific format, e.g. `'%d-%m-%Y'` for inputs like `"02-01-2021"` (Jan 2nd, 2021). – FObersteiner Sep 27 '21 at 10:41
  • Understood. Thank you very much – AA83 Sep 27 '21 at 10:50

1 Answers1

0

The problem comes from how pandas parses dates. When receiving 2021-02-01 it does not know if it is Feb 1st or Jan 2nd, so it applies its default decision rules: when the date starts with the year, the next field is the month, so resulting in Feb 1st. This is not the case when parsing 2021-01-21, there is only one possible date, Jan 21st.

Take a look at to_datetime documentation, and its parameters day_first or format, to force a given format when there are different possible parsings

Pantoofle
  • 336
  • 1
  • 13
  • Thank you @Pantoofle. Appreciate your response. I will refer to the documentation further. I will get back in case of any queries – AA83 Sep 27 '21 at 09:26
  • my issue got resolve using dayFirst property. thank you – AA83 Sep 27 '21 at 10:30
  • Nice ! You can mark the answer as valid, so that next users can more easily see that this answer worked for you – Pantoofle Sep 29 '21 at 12:03