2

I want to calculate the min/max for every n rows of a df, say 10, but using df.rolling(10).max() gives the values for rows 0-9, 1-10, 2-11 etc. I want 0-9, 10-19, 20-29 etc

Is there a neat way of doing this for a large dataset?

Thanks

user12845343
  • 21
  • 1
  • 2

2 Answers2

3

Use np.arange to calculate an array of 0 to len (df) -1 and then calculate the entire divison. We can use this array together with DataFrame.groupby.

n=9
df.groupby(np.arange(len(df))//n).max()

Example

Setup

df=pd.DataFrame({'A':range(30)})
print(df)
     A
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
10  10
11  11
12  12
13  13
14  14
15  15
16  16
17  17
18  18
19  19
20  20
21  21
22  22
23  23
24  24
25  25
26  26
27  27
28  28
29  29

Solution

n=9
df_max = df.groupby(np.arange(len(df))//n).max()
print(df_max)
    A
0   8
1  17
2  26
3  29

As you can see, groups listed from 0 to m are created. The resulting DataFrame is the maximum of each group.


You could also select columns to calculate max

df.groupby(np.arange(len(df))//n)[cols].max()

and of course use groupby.agg if you want to use more than one function

Detail

np.arange(len(df))//n
#array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
#       2, 2, 2, 2, 2, 3, 3, 3])

EDIT

first_row = 2
df2 = df[first_row:]
df2.groupby(np.arange(len(df2))//n).max()
Community
  • 1
  • 1
ansev
  • 30,322
  • 5
  • 17
  • 31
3

The key to success is grouping by np.arange(len(df.index)) // 10 (or change 10 (the group size) to whatever other value you need).

Assuming that you have 3 columns to compute min / max (['A', 'B', 'C']), you can run:

df.groupby(np.arange(len(df.index)) // 10)[['A', 'B', 'C']].agg(['min', 'max'])
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • this is my solution by changing `len(df)` to len `(df.index)` please do not copy solution – ansev Feb 05 '20 at 12:39
  • Our solutions are just 3 minutes apart, so nobody can claim that I just copied your solution. On the other hand, I included code to compute **all** required functions, not only *max()*. – Valdi_Bo Feb 05 '20 at 14:21