I have a dataframe with some dates,and associated data with each date that I am reading in from a csv file (the file is relatively small, on the magnitude of 10,000s of rows, and ~10 columns):
memid date a b
10000 7/3/2017 221 143
10001 7/4/2017 442 144
10002 7/6/2017 132 145
10003 7/8/2017 742 146
10004 7/10/2017 149 147
I want to add a column, "date_diff", to this dataframe that calculates the amount of days between each date and the previously most recent date (the rows are always sorted by date):
memid date a b date_diff
10000 7/3/2017 221 143 NaN
10001 7/4/2017 442 144 1
10002 7/6/2017 132 145 2
10003 7/8/2017 742 146 2
10004 7/11/2017 149 147 3
I am having trouble figuring out a good way to create this "date_diff" column as iterating row by row tends to be frowned upon when using pandas/numpy. Is there an easy way to create this column in python/pandas/numpy or is this job better done before the csv is read into my script?
Thanks!
EDIT: Thanks to jpp and Tai for their answer. It covers the original question but I have a follow up:
What if my dataset has multiple rows for each date? Is there a way to easily check the difference between each group of dates to produce an output like the example below? Is it easier if there are a set number of rows for each date?
memid date a b date_diff
10000 7/3/2017 221 143 NaN
10001 7/3/2017 442 144 NaN
10002 7/4/2017 132 145 1
10003 7/4/2017 742 146 1
10004 7/6/2017 149 147 2
10005 7/6/2017 457 148 2