0

I am getting to grips with python pandas.

The toy problem below, illustrates an issue I am having in a related exercise.

I have sorted a data-frame so that it presents a column's values (in this case students' test scores) in ascending order:

df_sorted = 
     variable    test_score
     1           52.0
     1           53.0
     4           54.0
     6           64.0
     6           64.0
     6           64.0
     5           71.0
     10          73.0
     15          75.0
     4           77.0

However, I would now like to bin the data-frame by the means of 2 columns (here "variable" and "test_score") but for every X entries from the start to the end of the data-frame. This will also me to create bins that contain equal numbers of entries (very useful for plotting in my associated exercise).

The output if I bin every 3 rows would therefore looks like:

df_sorted_binned = 
     variable    test_score
     2           53.0
     6           64.0
     10          73.0
     4           77.0

Can anyone see how I can do this easily?

Much obliged!

Sam Gregson
  • 159
  • 1
  • 14

1 Answers1

1

Just groupby a dummy variable that goes 0, 0, 0, 1, 1, 1, etc. This can be obtained with floor division:

>>> d.groupby(np.arange(len(d))//3).mean()
   variable  test_score
0         2          53
1         6          64
2        10          73
3         4          77
BrenBarn
  • 242,874
  • 37
  • 412
  • 384