I have a data frame like the below, which has a 'Stats' column containing different statistical values(avg, count, min, max) for every 1-sec timestamp.
Timestamps | Location | Service | Status Code | Stats | Value |
---|---|---|---|---|---|
1656533176875 | A | s1 | 200 | count | 5000 |
1656533176875 | A | s1 | 500 | avg. | 3000 |
.... | .... | .... | .... | ... | ... |
1656533176878 | B | s2 | 504 | max. | 6000 |
Now from this data frame, I have to resample the data for a user-defined time interval, say for a 5-minutes interval, and aggregate the values for that interval. As the values represent different aggregations, I need to apply different aggregate functions for different parts of the data frame. For example, if I want to resample for 5 minutes of data, I need to apply the average if the 'Stats' is 'avg' or apply the sum if the 'Stats' is count. Also, the grouping has to be by centers, service, and status code. I have written this code so far but I am unable to get the desired result.
#sample dataframe
rng = pd.date_range('2022-06-01', periods=10, freq='T')
np.random.seed(10)
l = ['A', 'B', 'C', 'D']
k = ['avg', 'max', 'min', 'count', 'median', 'std']
m = ['s1', 's2', 's3']
j = [200, 400, 404, 500]
df = pd.DataFrame(
{'date_time': rng,
'location': random.choices(l, k=10),
'service': random.choices(m, k=10),
'status': random.choices(j, k=10),
'stat': random.choices(k, k=10),
'value': np.random.randint(100, 900, size=10)
}
)
df.sort_values(by=df.columns.to_list())
print(df)
interval =5
resampled_df = df.assign(
n_count=np.where(df['stat'] == 'count'),
n_median=np.where(df['stat'] == 'median'),
n_max=np.where(df['stat'] == 'max'),
n_min=np.where(df['stat'] == 'min'),
n_avg=np.where(df['stat'] == 'avg'),
n_std=np.where(df['stat'] == 'std'),
).groupby(['date_time', 'location', 'services', 'status_code', 'stat']) \
.resample('{}min'.format(interval), on='date_time') \
.value \
.agg({'n_count': sum, 'n_avg': np.mean, 'n_max': max, 'n_min': min, 'n_std': np.std,
'n_median': np.median})
print(resampled_df)
I get the error ValueError: Length of values (1) does not match the length of index (10). The expected result is resampled data for 5 minutes with aggregated value depending on the 'Stats' column.