I am using the split-apply-combine pattern in pandas to create a new column, which measures the difference between two time stamps.
The following is a simplified example of my problem.
Say, I have this df
df = pd.DataFrame({'ssn_start_utc':pd.date_range('1/1/2011', periods=6, freq='D'), 'fld_id':[100,100,100,101,101,101], 'task_name': ['sowing','fungicide','insecticide','combine',''combine','sowing']})
df
I want to group by fld_id and apply a function which creates a column measuring difference between two timestamps for each row. Such as this
def pasttime(group):
val = group['ssn_start_utc'] - group['ssn_start_utc'][0]
# why group['ssn_start_utc'][0] ?
# Because it measures time difference for each row respective to first row of each group/ particular to *sowing* entry respective to each group. I have moved all *sowing* entries to first row of df for each group
return val
df["PastTime"] =df.groupby('fld_id',group_keys=False).apply(pasttime)
the resultant column df should look like this
df_new = pd.DataFrame({'ssn_start_utc':pd.date_range('1/1/2011', periods=6, freq='D'), 'fld_id':[100,100,100,101,101,101], 'task_name': ['sowing','fungicide','insecticide','combine',''combine','sowing'], 'pasttime' :[ 0 days, 1 days, 2 days, 3 days, -1 days, 0 days] })
df_new
I get a error KeyError: 0
I have also tried using groupby:
df['pasttime'] = df.groupby(['fld_id'])['ssn_start_utc'].transform( df['ssn_start_utc'] - df.loc[df['name']=='sowing','ssn_start_utc'].values[0])
How to apply a custom group function and have the desired df?