Given that your data is in a pd.DataFrame
called df
, the following should do the trick:
import pandas as pd
import numpy as np
step = 5
df.groupby(pd.cut(df.index,
np.arange(start=df.index.min(), stop=df.index.max(), step=step,
dtype=float)))\
.agg({'volume':['mean', 'std'], 'temperature':['mean']})
We are using pd.cut to create an IntervalIndex
which we can groupby
. Finally we use pd.DataFrame.agg
to calculate summary statistics for each group; mean
and std
for the volume
column and just mean
for the temperature
column.
I haven't tested this but if you provide a minimal, complete and verifiable example, I can do that.
EDIT
Given the updated data, I have written the code below:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: from io import StringIO
In [4]: s = """T,H,L,C,label
...: 1000.1,23.18,27.272,426,1
...: 1000.2,23.15,27.2675,429.5,1
...: 1000.3,23.15,27.245,426,1
...: 1000.4,23.15,27.2,426,1
...: 1000.5,23.1,27.2,426,1
...: 1000.6,23.1,27.2,419,1
...: 1000.7,23.1,27.2,419,1
...: 1000.8,23.1,27.2,419,1
...: 1000.9,23.1,27.2,419,1
...: 1001,23.075,27.175,419,1
...: 1001.1,23.075,27.15,419,1
...: 1001.2,23.1,27.1,419,1
...: 1001.3,23.1,27.16666667,419,1
...: 1001.4,23.05,27.15,419,1
...: 1001.5,23,27.125,419,1
...: 1001.6,23,27.125,418.5,1
...: 1001.7,23,27.2,0,0
...: 1001.8,22.945,27.29,0,0
...: 1001.9,22.945,27.39,0,0
...: 1002,22.89,27.39,0,0
...: 1002.1,22.89,27.39,0,0
...: 1002.2,22.89,27.39,0,0
...: 1002.3,22.89,27.445,0,0"""
In [5]: df = pd.read_csv(StringIO(s), index_col='T')
Again we use an IntervalIndex
and groupby
, as well as agg to calculate summary statistics.
In [6]: step = 0.5
...:
...: grouped = df.groupby(pd.cut(df.index,
...: np.arange(start=df.index.min(), stop=df.index.max(), step=step, dtype=float
...: )))
...:
In [7]: grouped.agg({'H':['mean', 'std'], 'L':['mean', 'std'], 'C':['mean', 'std']})
Out[7]:
H L C
mean std mean std mean std
(1000.1, 1000.6] 23.130 0.027386 27.222500 0.031820 425.3 3.834058
(1000.6, 1001.1] 23.090 0.013693 27.185000 0.022361 419.0 0.000000
(1001.1, 1001.6] 23.050 0.050000 27.133333 0.025685 418.9 0.223607
(1001.6, 1002.1] 22.934 0.046016 27.332000 0.085557 0.0 0.000000
This does not give you the column names you wanted, so let's flatten the column MultiIndex
to adjust those.
In [8]: aggregated = grouped.agg({'H':['mean', 'std'], 'L':['mean', 'std'], 'C':['mean', 'std']})
In [9]: ['_'.join(col).strip() for col in aggregated.columns.values]
Out[9]: ['H_mean', 'H_std', 'L_mean', 'L_std', 'C_mean', 'C_std']
In [10]: aggregated.columns = ['_'.join(col).strip() for col in aggregated.columns.values]
In [11]: aggregated
Out[11]:
H_mean H_std L_mean L_std C_mean C_std
(1000.1, 1000.6] 23.130 0.027386 27.222500 0.031820 425.3 3.834058
(1000.6, 1001.1] 23.090 0.013693 27.185000 0.022361 419.0 0.000000
(1001.1, 1001.6] 23.050 0.050000 27.133333 0.025685 418.9 0.223607
(1001.6, 1002.1] 22.934 0.046016 27.332000 0.085557 0.0 0.000000
It's not quite clear to me what you mean by applying Z-scores, because that's not a summary statistic, unlike std
and mean
, so it doesn't play well with agg. If you just want to apply Z-scores to your entire DataFrame by column, I would suggest, you might want to take a look at this question: Pandas - Compute z-score for all columns