4

I have the following dataframe.

  Date Returned Start Date
0    2017-06-02 2017-04-01
1    2017-06-02 2017-04-01
2    2017-06-02 2017-04-01
3    2017-06-02 2017-02-28
4    2017-06-02 2017-02-28
5    2017-06-02 2016-07-20
6    2017-06-02 2016-07-20

Both columns are type datetime64.

subframe[['Date Returned','Start Date']].dtypes
Out[9]: 
Date Returned    datetime64[ns]
Start Date       datetime64[ns]
dtype: object

Yet when I try to find the timedeltas between the two columns of dates, I get this error.

subframe['Delta']=subframe['Date Returned'] - subframe['Start Date']

TypeError: data type "datetime" not understood 

Is there a fix for this? I've tried everything I can think of and have pulled out most of my hair at this point. Any help is greatly appreciated. I did find someone posting the same problem, but no one really answered it.

bemery
  • 81
  • 4
  • It works fine on my computer. – Bubble Bubble Bubble Gut Jun 16 '17 at 15:44
  • The same code doesn't make any mistakes in my case.What version are you using? Or you can uninstall packages and try again. – giser_yugang Jun 16 '17 at 16:02
  • I did try uninstalling and reinstalling pandas. Same result. This is running on Windows 7, but I don't imagine that would make any difference. – bemery Jun 16 '17 at 16:55
  • Possible duplicate of [datetime dtypes in pandas read\_csv](https://stackoverflow.com/questions/21269399/datetime-dtypes-in-pandas-read-csv) – Bharath M Shetty Jun 23 '17 at 03:55
  • 1
    I don't have enough points to comment, but I get the same error with Pandas 18.1. The funny thing is that it works if I select one row. There are not any missing values, so it is very strange behavior. – drj Jun 23 '17 at 03:43

2 Answers2

3

I think the problem has possibly been solved in more recent versions of pandas (and, maybe relevant, numpy), and maybe it has always been Windows-specific. However, on the computer I'm working on (pandas 0.18.0, numpy 1.13, under Windows 7) it's still not solved.

For those in the same condition as me, there is a workaround that works quite faster than @blacksite's one:

subframe['Delta'] = subframe['Date Returned'].values - subframe['Start Date'].values

Silly as it looks, putting the ".values" converts them to Numpy datetime64 objects, that correctly subtracts them. Assigning it to a pandas Data Frame column it converts back to Timestamp object, again correctly.

On my dataframe (around 90k rows), this takes less than 0.01s (all used to create a new column in pandas and convert from numpy to Timestamp), while the other workaround takes about 1.5s.

Marco Spinaci
  • 1,750
  • 15
  • 22
2

I received the same error in pandas 0.18.1. Here's a workaround, iteratively operating on individual beginning-end pairs:

d['diff'] = [ret - start for start, ret in zip(d['Start'], d['Returned'])]

d is now:

Returned      Start     diff
0 2017-06-02 2017-04-01  62 days
1 2017-06-02 2017-04-01  62 days
2 2017-06-02 2017-04-01  62 days
3 2017-06-02 2017-02-28  94 days
4 2017-06-02 2017-02-28  94 days
5 2017-06-02 2016-07-20 317 days
6 2017-06-02 2016-07-20 317 days

This workaround is much slower than I would imagine the native pandas implementation would be. Sigh.

blacksite
  • 12,086
  • 10
  • 64
  • 109