2

I didn't know how to better express myself in the title. Basically what I have is two lists:

a = ['A','B','A','C','D','C','A',...] 
b = [2,4,8,3,5,2,1,...]

a and b have the same length, b represents a value related to the letter in a .

Now I would like to calculate the Average value in b for each letter in a. So at the end I would have:

a = ['A','B','C','D',...]
b = [3.67, 4, 2.5, 5,...]

Is there a standard implementation for this in python?

Lotzki
  • 489
  • 1
  • 4
  • 18
  • Possible duplicate: http://stackoverflow.com/questions/21674331/group-by-multiple-keys-and-summarize-average-values-of-a-list-of-dictionaries – moritzg May 16 '17 at 11:24

3 Answers3

4

You can first perform a group by. We can do this for instance with a defaultdict:

from collections import defaultdict

col = defaultdict(list)

for ai,bi in zip(a,b):
    col[ai].append(bi)

Now the dictionary col will look like:

>>> col
defaultdict(<class 'list'>, {'C': [3, 2], 'B': [4], 'D': [5], 'A': [2, 8, 1]})

and now we can calculate the average of all elements in the dictionary for instance like:

>>> {key:sum(vals)/len(vals) for key,vals in col.items()}
{'C': 2.5, 'B': 4.0, 'D': 5.0, 'A': 3.6666666666666665}

You can also convert it to two tuples by using zip:

a,b = zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()])

resulting in:

>>> a,b = zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()])
>>> a
('C', 'B', 'D', 'A')
>>> b
(2.5, 4.0, 5.0, 3.6666666666666665)

If you want to generate lists instead, you can convert them to lists:

a,b = map(list,zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()]))

This results in:

>>> a,b = map(list,zip(*[(key,sum(vals)/len(vals)) for key,vals in col.items()]))
>>> a
['C', 'B', 'D', 'A']
>>> b
[2.5, 4.0, 5.0, 3.6666666666666665]
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
0

I believe a cleaner way to do this would be to simply use a pandas groupby:

import pandas as pd
data = pd.DataFrame(b,index=a)
a,b = (list(data.groupby(data.index)[0].mean().index),list(data.groupby(data.index)[0].mean()))
DrTRD
  • 1,641
  • 1
  • 13
  • 18
0

You can use numpy as follows:

>>> import numpy as np
>>> array_a = np.array(a)
>>> array_b = np.array(b)
>>> avrg_of_a = np.average(array_b[array_a == 'A'])
>>> avrg_of_a
3.6666666666666665
>>> avrg_of_b = np.average(array_b[array_a == 'B']) 
4.0

You can generate a list use list comprehensions [np.average(array_b[array_a == item]) for item in np.unique(array_a)]

stamaimer
  • 6,227
  • 5
  • 34
  • 55
  • This is not generalized and doesn't provide results as requested by the OP. – DrTRD May 16 '17 at 13:27
  • Why this is not generalized? – stamaimer May 16 '17 at 13:28
  • You've specifically defined variables for each value in your array_a. – DrTRD May 16 '17 at 13:33
  • You still haven't provided the answer in the format requested by the OP. – DrTRD May 16 '17 at 13:35
  • I contend that your answer does not answer the question as asked by the OP -- you do not end up with results for a and b as requested. See my answer or the selected answer for those that have answered the question as asked. – DrTRD May 16 '17 at 13:49