0

I would like to bin a dataframe in pandas based on the sum of another column.

I have the following dataframe:

time    variable    frequency
2           7         7
3           12        2
4           13        3
6           15        4
6           18        4
6           3         1
10          21        2
11          4         5
13          6         5
15          17        6
17          5         4

I would like to bin the data so that each group contains a minimum total frequency of 10 and output the average time and the total variable and total frequency.

avg time    total variable  total frequency
3                 32             12
7                 57             11
12                10             10
16                22             10

Any help would be greatly appreciated

WGP
  • 708
  • 2
  • 7
  • 18

1 Answers1

0

A little brute force would get you a long way.

import numpy as np

data = ((2, 7, 7),
        (3, 12, 2),
        (4, 13, 3),
        (6, 15, 4),
        (6, 18, 4),
        (6, 3, 1),
        (10, 21, 2),
        (11, 4, 5),
        (13, 6, 5),
        (15, 17, 6),
        (17, 5, 4))

freq = [data[i][2] for i in range(len(data))]
variable = [data[i][1] for i in range(len(data))]
time = [data[i][0] for i in range(len(data))]

freqcounter = 0
timecounter = 0
variablecounter = 0
counter = 0

freqlist = []
timelist = []
variablelist = []

for k in range(len(data)):
    freqcounter += freq[k]
    timecounter += time[k]
    variablecounter += variable[k]
    counter += 1
    if freqcounter >= 10:
        freqlist.append(freqcounter)
        timelist.append(timecounter/counter)
        variablelist.append(variablecounter)
        freqcounter = 0
        timecounter = 0
        variablecounter = 0
        counter = 0

print(timelist)
print(variablelist)        
print(freqlist)
Aguy
  • 7,851
  • 5
  • 31
  • 58
  • Quick follow up, how would i edit this to work if i import the initial data from a csv file – WGP Jun 28 '16 at 10:45