27

I wrote a code that reads multiple files, however on some of my files datetime swaps day & month whenever the day is less than 13, and any day that is from day 13 or above i.e. 13/06/11 remains correct (DD/MM/YY). I tried to fix it by doing this,but it doesn't work.

My data frame looks like this: The actual datetime is from 12june2015 to 13june2015 when my I read my datetime column as a string the dates remain correct dd/mm/yyyy

tmp                     p1 p2 
11/06/2015 00:56:55.060  0  1
11/06/2015 04:16:38.060  0  1
12/06/2015 16:13:30.060  0  1
12/06/2015 21:24:03.060  0  1
13/06/2015 02:31:44.060  0  1
13/06/2015 02:37:49.060  0  1

but when I change the type of my column to datetime column it swaps my day and month for each day that is less than 13.

output:

print(df)
tmp                  p1 p2 
06/11/2015 00:56:55  0  1
06/11/2015 04:16:38  0  1
06/12/2015 16:13:30  0  1
06/12/2015 21:24:03  0  1
13/06/2015 02:31:44  0  1
13/06/2015 02:37:49  0  1

Here is my code :

I loop through files :

df = pd.read_csv(PATH+file, header = None,error_bad_lines=False , sep = '\t')

then when my code finish reading all my files I concatenat them, the problem is that my datetime column needs to be in a datetime type so when I change its type by pd_datetime() it swaps the day and month when the day is less than 13.

Post converting my datetime column the dates are correct (string type)

print(tmp) # as a result I get 11.06.2015 12:56:05 (11june2015)

But when I change the column type I get this:

tmp = pd.to_datetime(tmp, unit = "ns")
tmp = temps_absolu.apply(lambda x: x.replace(microsecond=0))
print(tmp) # I get 06-11-2016 12:56:05 (06november2015 its not the right date)

The question is : What command should i use or change in order to stop day and month swapping when the day is less than 13?

UPDATE This command swaps all the days and months of my column

tmp =  pd.to_datetime(tmp, unit='s').dt.strftime('%#m/%#d/%Y %H:%M:%S') 

So in order to swap only the incorrect dates, I wrote a condition:

for t in tmp:
        if (t.day < 13):
            t = datetime(year=t.year, month=t.day, day=t.month, hour=t.hour, minute=t.minute, second = t.second)

But it doesn't work either

Oumab10
  • 696
  • 2
  • 6
  • 14

4 Answers4

60

You can use the dayfirst parameter in pd.to_datetime.

pd.to_datetime(df.tmp, dayfirst=True)

Output:

0   2015-06-11 00:56:55
1   2015-06-11 04:16:38
2   2015-06-12 16:13:30
3   2015-06-12 21:24:03
4   2015-06-13 02:31:44
5   2015-06-13 02:37:49
Name: tmp, dtype: datetime64[ns]
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 5
    Why is this not the accepted answer? It works great, thanks! – Dionysos Da Vinci Sep 04 '20 at 12:37
  • @DionysosDaVinci Thanks. Maybe the original poster had a different issue along with what was documented here. – Scott Boston Sep 04 '20 at 14:14
  • Would persist the error if date format is specified? – Miguel Gonzalez May 28 '21 at 12:49
  • 1
    @MiguelGonzalez No, it shouldn't Miguel, if you used the exact format of %d/%m/%Y. The error of flopping will not occur. – Scott Boston May 28 '21 at 12:52
  • Perfect, thank you very much @Scott Boston – Miguel Gonzalez May 28 '21 at 12:53
  • @Scott Boston Isn't it is a bug in pandas? when we use pd.to_datetime it should do the conversion as it is without flip-flopping our days and month. dayfirst=True is a life savior but don't you think that should not be the case in the first place. When anybody does date conversion, of course, no one would want his date format to change unknowingly. – Amit Tiwari Feb 10 '22 at 14:20
  • 1
    @AmitTiwari It's an American standard we list Feb 1, 2022 as 2/1/2022. In other places of the world the may list Feb 1, 2022 as 1/2/2022. Pandas assumes the first format, you override that with the dayfirst parameter to make pandas read as 1/2/2022 as Feb 1, 2022 insteas of Jan 2, 2022. Hence what see as flipping is only happening when "months" exceed 12. – Scott Boston Feb 10 '22 at 15:54
  • 1
    Thank you so much! @Scott Boston your answer is helpful! I have searched many resources to solve this problem but not worked well until I found your answer, it solve the problem. Thank you! – Yeo Keat Mar 27 '22 at 04:39
  • can someone elaborate why the to_datetime without dayfirst=True messes up days and months? – Pfinnn May 10 '22 at 18:03
  • I think pandas is going to fix this issue with 2.0 https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#datetimes-are-now-parsed-with-a-consistent-format – Scott Boston Feb 28 '23 at 14:18
1

Well I solved my problem but in a memory consuming method, I split my tmp column first to a date and time columns then I re-split my date column to day month and year, that way I could look for the days that are less than 13 and replace them with the correspondent month

df['tmp'] = pd.to_datetime(df['tmp'], unit='ns')
df['tmp'] = df['tmp'].apply(lambda x: x.replace(microsecond=0))
df['date'] = [d.date() for d in df['tmp']]
df['time'] = [d.time() for d in df['tmp']]
df[['year','month','day']] = df['date'].apply(lambda x: pd.Series(x.strftime("%Y-%m-%d").split("-")))

df['day'] = pd.to_numeric(df['day'], errors='coerce')
df['month'] = pd.to_numeric(df['month'], errors='coerce')
df['year'] = pd.to_numeric(df['year'], errors='coerce')


#Loop to look for days less than 13 and then swap the day and month
for index, d in enumerate(df['day']):
        if(d <13): 
 df.loc[index,'day'],df.loc[index,'month']=df.loc[index,'month'],df.loc[index,'day'] 

# convert series to string type in order to merge them

 df['day'] = df['day'].astype(str)
 df['month'] = df['month'].astype(str)
 df['year'] = df['year'].astype(str)
 df['date']=  pd.to_datetime(df[['year', 'month', 'day']])
 df['date'] = df['date'].astype(str)
 df['time'] = df['time'].astype(str)

# merge time and date and place result in our column

df['tmp'] =pd.to_datetime(df['date']+ ' '+df['time'])

# drop the added columns

df.drop(df[['date','year', 'month', 'day','time']], axis=1, inplace = True)
Oumab10
  • 696
  • 2
  • 6
  • 14
1

I ran into the same issue. In my case the dates were the index column (called "Date"). The above mentioned solution using to_datetime() directly on the dataframe with index column "Date" didn't work for me. I had to use read_csv() first without setting the index to "Date", then apply to_datetime() on it and only then set the index to "Date".

df= pd.read_csv(file, parse_dates=True)
df.Date = pd.to_datetime(df.Date, dayfirst=True)
df = df.set_index('Date')
Irina S.
  • 133
  • 8
0

I got the same problem, the day and month were switching from 13 onwards. This works for me, basically I reorder the date through string type with a conditional and use to_datetime.

def calendario(fecha):
    
    if fecha.day < 13:
        dia_real = fecha.month
        mes_real = fecha.day
        
        if dia_real < 10:
            dia_real = '0'+str(dia_real)
        
        nfecha = str(dia_real) + str(mes_real) + str(fecha.year)
        nfecha = pd.to_datetime(nfecha, format='%d%m%Y', errors='ignore')
        
    else:
        nfecha = fecha
    
    return nfecha

df['Nueva_fecha']=df['Fecha'].apply(calendario) 

The output as expected: enter image description here

Pachu MS
  • 23
  • 1
  • 6