irregular binning with regards to the sum of a column

Question

I would like to bin a dataframe in pandas based on the sum of another column.

I have the following dataframe:

time    variable    frequency
2           7         7
3           12        2
4           13        3
6           15        4
6           18        4
6           3         1
10          21        2
11          4         5
13          6         5
15          17        6
17          5         4

I would like to bin the data so that each group contains a minimum total frequency of 10 and output the average time and the total variable and total frequency.

avg time    total variable  total frequency
3                 32             12
7                 57             11
12                10             10
16                22             10

Any help would be greatly appreciated

score 0 · Answer 1 · answered Jun 27 '16 at 15:02

A little brute force would get you a long way.

import numpy as np

data = ((2, 7, 7),
        (3, 12, 2),
        (4, 13, 3),
        (6, 15, 4),
        (6, 18, 4),
        (6, 3, 1),
        (10, 21, 2),
        (11, 4, 5),
        (13, 6, 5),
        (15, 17, 6),
        (17, 5, 4))

freq = [data[i][2] for i in range(len(data))]
variable = [data[i][1] for i in range(len(data))]
time = [data[i][0] for i in range(len(data))]

freqcounter = 0
timecounter = 0
variablecounter = 0
counter = 0

freqlist = []
timelist = []
variablelist = []

for k in range(len(data)):
    freqcounter += freq[k]
    timecounter += time[k]
    variablecounter += variable[k]
    counter += 1
    if freqcounter >= 10:
        freqlist.append(freqcounter)
        timelist.append(timecounter/counter)
        variablelist.append(variablecounter)
        freqcounter = 0
        timecounter = 0
        variablecounter = 0
        counter = 0

print(timelist)
print(variablelist)        
print(freqlist)

Quick follow up, how would i edit this to work if i import the initial data from a csv file — WGP, Jun 28 '16 at 10:45

irregular binning with regards to the sum of a column

1 Answers1