This isn't duplicate. I already referred this post_1 and post_2
My question is different and not about agg
function. It is about displaying grouped by column as well during ffill
operation. Though the code works fine, just sharing the full code for you to get an idea. Problem is in the commented line. look out for that line below.
I have a dataframe like as given below
df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,2,2,2,2,2],
'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-05 12:59:00','2173-05-04 13:14:00','2173-05-05 13:37:00','2173-07-06 13:39:00','2173-07-08 11:30:00','2173-04-08 16:00:00','2173-04-09 22:00:00','2173-04-11 04:00:00','2173- 04-13 04:30:00','2173-04-14 08:00:00'],
'val' :[5,5,5,5,1,6,5,5,8,3,4,6]})
df['time_1'] = pd.to_datetime(df['time_1'])
df['day'] = df['time_1'].dt.day
df['month'] = df['time_1'].dt.month
What this code with the help of Jezrael from forum does is add missing dates
based on threshold value. Only issue is,I don't see the grouped by column during output
df['time_1'] = pd.to_datetime(df['time_1'])
df['day'] = df['time_1'].dt.day
df['date'] = df['time_1'].dt.floor('d')
df1 = (df.set_index('date')
.groupby('subject_id')
.resample('d')
.last()
.index
.to_frame(index=False))
df2 = df1.merge(df, how='left')
thresh = 5
mask = df2['day'].notna()
s = mask.cumsum().mask(mask)
df2['count'] = s.map(s.value_counts())
df2 = df2[(df2['count'] < thresh) | (df2['count'].isna())]
df2 = df2.groupby(df2['subject_id']).ffill() # problem is here #here is the problem
dates = df2['time_1'].dt.normalize()
df2['time_1'] += np.where(dates == df2['date'], 0, df2['date'] - dates)
df2['day'] = df2['time_1'].dt.day
df2['val'] = df2['val'].astype(int)
As shown in code above, I tried the below approaches
df2 = df2.groupby(df2['subject_id']).ffill() # doesn't help
df2 = df2.groupby(df2['subject_id']).ffill().reset_index() # doesn't help
df2 = df2.groupby('subject_id',as_index=False).ffill() # doesn't help
Incorrect output without subject_id
I expect my output to have subject_id
column as well