I am trying to reindex groups to extend dataframes with missing values. Similar as resample works for time indexes, I am trying to achieve this for normal integer values.
So, for a group belonging to a certain group key (proID in my case) a maximum existent integer value shall be determined (specifying the end point of the resampling process). The group shall be extended (I was trying to achieve it with reindex) by the missing values of this integer value.
I have a dataframe having many rows per proID and a integer bin value which can range from 0 to 100 and some meaningless columns. Basically, the bin value shall be filled if some data are missing similarly as resample would do for time indexes.
def rsmpint(df):
mx = df.bin.max() #identify maximal existing bin value in dataframe (group)
no = (mx * 20 / 100).astype(np.int64) + 1 #calculate number of bin values
idx = pd.Index(np.linspace(0,mx,no), name='bin') # define full bin-Index for df (group)
df.set_index('bin').reindex(idx).ffill().reset_index(drop=True, inplace=True)
return df
DF.groupby('proID').apply(rsmpint)
Let assume for a specific proID there are currently 5 bin values [0, 15, 20, 40, 65] (i.e. 5 rows of the original proID group). The output shall be an extended proID group with bin values [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 65] with the content of the "meaningless" columns filled using ffill().