Calculate average of list values grouped by second list

Question

I didn't know how to better express myself in the title. Basically what I have is two lists:

a = ['A','B','A','C','D','C','A',...] 
b = [2,4,8,3,5,2,1,...]

a and b have the same length, b represents a value related to the letter in a .

Now I would like to calculate the Average value in b for each letter in a. So at the end I would have:

a = ['A','B','C','D',...]
b = [3.67, 4, 2.5, 5,...]

Is there a standard implementation for this in python?

Possible duplicate: http://stackoverflow.com/questions/21674331/group-by-multiple-keys-and-summarize-average-values-of-a-list-of-dictionaries — moritzg, May 16 '17 at 11:24

score 4 · Accepted Answer · answered May 16 '17 at 11:30

You can first perform a group by. We can do this for instance with a defaultdict:

from collections import defaultdict

col = defaultdict(list)

for ai,bi in zip(a,b):
    col[ai].append(bi)

Now the dictionary col will look like:

>>> col
defaultdict(<class 'list'>, {'C': [3, 2], 'B': [4], 'D': [5], 'A': [2, 8, 1]})

and now we can calculate the average of all elements in the dictionary for instance like:

>>> {key:sum(vals)/len(vals) for key,vals in col.items()}
{'C': 2.5, 'B': 4.0, 'D': 5.0, 'A': 3.6666666666666665}

You can also convert it to two tuples by using zip:

a,b = zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()])

resulting in:

>>> a,b = zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()])
>>> a
('C', 'B', 'D', 'A')
>>> b
(2.5, 4.0, 5.0, 3.6666666666666665)

If you want to generate lists instead, you can convert them to lists:

a,b = map(list,zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()]))

This results in:

>>> a,b = map(list,zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()]))
>>> a
['C', 'B', 'D', 'A']
>>> b
[2.5, 4.0, 5.0, 3.6666666666666665]

score 0 · Answer 2 · answered May 16 '17 at 12:43

0

I believe a cleaner way to do this would be to simply use a pandas groupby:

import pandas as pd
data = pd.DataFrame(b,index=a)
a,b = (list(data.groupby(data.index)[0].mean().index),list(data.groupby(data.index)[0].mean()))

answered May 16 '17 at 12:43

DrTRD

1,641
1
13
18

stamaimer · Answer 3 · 2017-05-16T13:32:58.340

0

You can use numpy as follows:

>>> import numpy as np
>>> array_a = np.array(a)
>>> array_b = np.array(b)
>>> avrg_of_a = np.average(array_b[array_a == 'A'])
>>> avrg_of_a
3.6666666666666665
>>> avrg_of_b = np.average(array_b[array_a == 'B']) 
4.0

You can generate a list use list comprehensions [np.average(array_b[array_a == item]) for item in np.unique(array_a)]

edited May 16 '17 at 13:32

answered May 16 '17 at 13:22

stamaimer

6,227
5
34
55

This is not generalized and doesn't provide results as requested by the OP. – DrTRD May 16 '17 at 13:27
Why this is not generalized? – stamaimer May 16 '17 at 13:28
You've specifically defined variables for each value in your array_a. – DrTRD May 16 '17 at 13:33
You still haven't provided the answer in the format requested by the OP. – DrTRD May 16 '17 at 13:35
I contend that your answer does not answer the question as asked by the OP -- you do not end up with results for a and b as requested. See my answer or the selected answer for those that have answered the question as asked. – DrTRD May 16 '17 at 13:49

Calculate average of list values grouped by second list

3 Answers3