14

If I have a DataFrame like this

Date           value
04 May 2015     1
06 May 2015     1
07 May 2015     1
11 May 2015     1
11 May 2015     1

How do I get the diff of the Date Index? i.e. the third col below:

Date           value   Diff
04 May 2015     1      NA
06 May 2015     1       2
07 May 2015     1       1
11 May 2015     1       4
11 May 2015     1       0
ayhan
  • 70,170
  • 20
  • 182
  • 203
dram
  • 467
  • 2
  • 5
  • 11
  • Possible duplicate of [Calculate time difference between Pandas Dataframe indices](https://stackoverflow.com/questions/16777570/calculate-time-difference-between-pandas-dataframe-indices) – jimijazz Feb 08 '18 at 12:58

2 Answers2

16

you can use pandas.Series.diff

>>> df['Diff'] = df.index.to_series().diff()

            value     Diff
Date                    
2015-05-04      1      NaT
2015-05-06      1   2 days
2015-05-07      1   1 days
2015-05-11      1   4 days
2015-05-11      1   0 days

elegant way to convert to float is

df['Diff'] = df.index.to_series().diff().dt.days
>>df
            value  Diff
Date                   
2015-05-04      1   NaN
2015-05-06      1   2.0
2015-05-07      1   1.0
2015-05-11      1   4.0
2015-05-11      1   0.0

more faster way is to typecast to days

df.index.to_series().diff().astype('timedelta64[D]')

to convert to Integer (pandas verson >= 0.24)

df.index.to_series().diff().astype('timedelta64[D]').astype('Int64') 
>>df
            value  Diff
Date                   
2015-05-04      1   NaN
2015-05-06      1     2
2015-05-07      1     1
2015-05-11      1     4
2015-05-11      1     0

Note : Int64 is Pandas Nullable Integer Data Type (not int64)

Shijith
  • 4,602
  • 2
  • 20
  • 34
5

You mean something like:

df["Diff"] = df.index
df["Diff"] = (df['Diff'] - df['Diff'].shift())

print(df)
            value   Diff
Date                    
2015-05-04      1    NaT
2015-05-06      1 2 days
2015-05-07      1 1 days
2015-05-11      1 4 days
2015-05-11      1 0 days
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • 1
    That does it. Actually, I also realised that i needed to format the dates from strings before applying the above (i.e. replace the first line with df["Diff"]=pd.to_datetime(df.index,dayfirst=True)) – dram Jul 22 '15 at 06:51