0

I have a data frame like the below, which has a 'Stats' column containing different statistical values(avg, count, min, max) for every 1-sec timestamp.

Timestamps Location Service Status Code Stats Value
1656533176875 A s1 200 count 5000
1656533176875 A s1 500 avg. 3000
.... .... .... .... ... ...
1656533176878 B s2 504 max. 6000

Now from this data frame, I have to resample the data for a user-defined time interval, say for a 5-minutes interval, and aggregate the values for that interval. As the values represent different aggregations, I need to apply different aggregate functions for different parts of the data frame. For example, if I want to resample for 5 minutes of data, I need to apply the average if the 'Stats' is 'avg' or apply the sum if the 'Stats' is count. Also, the grouping has to be by centers, service, and status code. I have written this code so far but I am unable to get the desired result.

        #sample dataframe
        rng = pd.date_range('2022-06-01', periods=10, freq='T')
        np.random.seed(10)
        l = ['A', 'B', 'C', 'D']
        k = ['avg', 'max', 'min', 'count', 'median', 'std']
        m = ['s1', 's2', 's3']
        j = [200, 400, 404, 500]

        df = pd.DataFrame(
            {'date_time': rng,
             'location': random.choices(l, k=10),
             'service': random.choices(m, k=10),
             'status': random.choices(j, k=10),
             'stat': random.choices(k, k=10),
             'value': np.random.randint(100, 900, size=10)
             }
        )

        df.sort_values(by=df.columns.to_list())
        print(df)
        interval =5
        resampled_df = df.assign(
            n_count=np.where(df['stat'] == 'count'),
            n_median=np.where(df['stat'] == 'median'),
            n_max=np.where(df['stat'] == 'max'),
            n_min=np.where(df['stat'] == 'min'),
            n_avg=np.where(df['stat'] == 'avg'),
            n_std=np.where(df['stat'] == 'std'),
        ).groupby(['date_time', 'location', 'services', 'status_code', 'stat']) \
            .resample('{}min'.format(interval), on='date_time') \
            .value \
            .agg({'n_count': sum, 'n_avg': np.mean, 'n_max': max, 'n_min': min, 'n_std': np.std,
                  'n_median': np.median})

        print(resampled_df)

I get the error ValueError: Length of values (1) does not match the length of index (10). The expected result is resampled data for 5 minutes with aggregated value depending on the 'Stats' column.

sherin_a27
  • 153
  • 8

1 Answers1

0

I am still not sure exactly what your output should look like, but here is an attempt:

agg_dict = {'avg':'mean',
            'count':'count',
            'max':'max',
            'median':'median',
            'std':'std'}
interval = 2
df.pivot(['date_time', 'location', 'service', 'status'],
          'stat', 
          'value')\
  .reset_index(level=[1,2,3])\
  .rename_axis('date_time')\
  .groupby(['date_time', 'location', 'service', 'status']) \
            .resample(f'{interval}T')\
            .agg(agg_dict)\
            .reset_index(level=0, drop=True)\
            .stack()\
            .reset_index()

Output:

   location service  status           date_time    stat      0
0         B      s1     200 2022-06-01 00:00:00   count    0.0
1         B      s1     200 2022-06-01 00:00:00     max  482.0
2         B      s2     500 2022-06-01 00:00:00     avg  753.0
3         B      s2     500 2022-06-01 00:00:00   count    0.0
4         A      s2     200 2022-06-01 00:02:00   count    0.0
5         A      s2     200 2022-06-01 00:02:00  median  442.0
6         D      s3     200 2022-06-01 00:02:00   count    0.0
7         D      s3     200 2022-06-01 00:02:00     max  898.0
8         A      s1     500 2022-06-01 00:04:00   count    1.0
9         B      s2     500 2022-06-01 00:04:00   count    0.0
10        D      s1     404 2022-06-01 00:06:00   count    0.0
11        A      s3     200 2022-06-01 00:06:00   count    0.0
12        B      s1     500 2022-06-01 00:08:00   count    1.0
13        B      s3     404 2022-06-01 00:08:00   count    0.0
14        B      s3     404 2022-06-01 00:08:00     max  383.0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187