2

I have this panda dataframe df_scaled.

      Name        Date      Sales 
283    AXP  10-02-2009  1.143791
284    AXP  11-02-2009  1.065359
285    AXP  12-02-2009  1.039869
286    AXP  13-02-2009  1.049020
287    AXP  17-02-2009  0.975817
288    AXP  18-02-2009  0.922222

I run the following code to convert the date format in the Date column from dd-mm-yyyy to yyyy-mm-dd.

df_scaled['Date'] = pd.to_datetime(df_scaled['Date']).dt.strftime('%Y-%m-%d')

The result looks like this;

      Name      Date      Sales 
283    AXP  2009-10-02  1.143791  
284    AXP  2009-11-02  1.065359  
285    AXP  2009-12-02  1.039869  
286    AXP  2009-02-13  1.049020  
287    AXP  2009-02-17  0.975817  
288    AXP  2009-02-18  0.922222 

In the first 3 rows, the converted date format is yyyy-dd-mm. In the last 3 rows, the converted date format is yyyy-mm-dd. What is wrong with the code to produce this inconsistency in the conversion?

I am using python v3.6

user1315789
  • 3,069
  • 6
  • 18
  • 41
  • I've observed similar behavior in pandas. See also https://stackoverflow.com/questions/37538080/pandas-inconsistent-date-time-format – M.Rau Aug 13 '18 at 14:23
  • Unfortunately, there is no way for pandas to determine date format you are converting *from*, unless you specify it manually. Therefore, it appears pandas autoconvert starts by assuming date is in semi-American format (mm-dd-yyyy), then it encounters values which can't be right for mm-dd-yyyy (like 18-02), and tries another format (dd-mm-yyyy) . – Gnudiff Aug 13 '18 at 14:28
  • Still pandas could do better by guessing since the majority should win. But this is off topic in this context. – M.Rau Aug 13 '18 at 16:51

1 Answers1

4

Adding dayfirst

df_scaled['Date'] = pd.to_datetime(df_scaled['Date'], dayfirst=True).dt.strftime('%Y-%m-%d')    
Out[510]: 
283   2009-02-10
284   2009-02-11
285   2009-02-12
286   2009-02-13
287   2009-02-17
288   2009-02-18
Name: Date, dtype: datetime64[ns]
user1315789
  • 3,069
  • 6
  • 18
  • 41
BENY
  • 317,841
  • 20
  • 164
  • 234