Python get weighted mean of dict keys based on dict values

Question

I am trying to write code to find the mean of the keys in my dict, but based on the dict values. So, for example, for:

d = {1:2, 2:1, 3:2}

the dict keys would be:

[1,1,2,3,3]

I've written the following code, which works for small data sets such as the above:

def get_median_of_dict_keys(d: dict) -> float:
    nums_list = []
    for k,v in d.items():
        if type(v) != int:
            raise TypeError
        nums_list.extend([k] * v)
    
    median = sum(nums_list) / len(nums_list)
    return median

This gets me the values I want when the data set is small, but if the data set is something like:

d = {1:1_000_000_000_000_000, 2:2_000, 3:1_000_000_000_000_000}

I get an out of memory error which, now that I think about it, makes sense.

So how can I structure the above function in a way that will also handle those larger data sets? Thanks for your time.

the keys of `d = {1:2, 2:1, 3:2}` are `1,2,3` not `1,1,2,3,3` — Ironkey, Nov 14 '20 at 18:00
Can you elaborate on how `{1:2, 2:1, 3:2}` should be `[1,1,2,3,3]`? — Red, Nov 14 '20 at 18:01
Ironkey, I know, but I need to count each key individually, or as its own entry. — J. B., Nov 14 '20 at 18:02
alright so you want `n` keys where the value of that key is `n`? — Ironkey, Nov 14 '20 at 18:03
Ann, so I need to count each key individually. So the "1" key has a value of two, so I need to count it twice. If the key was 5 and the value was 14, I'd need to treat it as though there were fourteen 5's. Almost like a voting tally that might've been done in grade school, i.e. 14 people voted for 5. — J. B., Nov 14 '20 at 18:04
Ironkey, exactly. Again, the code works with small numbers, but once they get large, I run out of memory. — J. B., Nov 14 '20 at 18:05
Edit your question to find the mean, or you actually want the median? — Dani Mesejo, Nov 14 '20 at 18:12
Thanks, folks, I did edit the title. I really appreciate your help! — J. B., Nov 14 '20 at 18:31
Removed tag `median` since you want to find `mean` specifically `weighted-mean` so added both of them — Ch3steR, Nov 14 '20 at 18:39

score 1 · Answer 1 · answered Nov 14 '20 at 18:03

1

You do not need to create a list, just keep two running variables, one holding the total sum and the other one holding the number of elements:

def get_mean_of_dict_keys(d: dict) -> float:
    total = 0
    count = 0
    for k, v in d.items():
        total += k * v
        count += v

    mean = total / count
    return mean


print(get_mean_of_dict_keys({1: 2, 2: 1, 3: 2}))

Output

2.0

answered Nov 14 '20 at 18:03

Dani Mesejo

61,499
6
49
76

Dani, thank you so much! I actually started thinking about this a bit after I asked the question. I appreciate your help! – J. B. Nov 14 '20 at 18:29

Ironkey · Answer 2 · 2020-11-14T18:34:45.543

1

If you want the mean

this is perfectly attainable with larger numbers:

import numpy as np
d = {1:2000000000, 2:1000, 3:2000000000}
print(np.mean([i*d[i] for i in d]))

output

2666667333.3333335

breakdown

[i*d[i] for i in d]

# is equivalent to:

lst = []
for i in d:
    lst.append(i*d[i])

edited Nov 14 '20 at 18:34

answered Nov 14 '20 at 18:06

Ironkey

2,568
1
8
30

glad to help :) – Ironkey Nov 14 '20 at 18:30

Ch3steR · Answer 3 · 2020-11-14T18:49:34.020

What you want to find is weighted average.

Formula:

Where,

X_1..n are keys in your dictionary.
W_1..n are values in your dictionary.
X̅ is weighted average.

Pure Python approach.

Using itertools.starmap with operator.mul

from itertools import starmap
from operator import mul
d = {1:2, 2:1, 3:2}
sum(starmap(mul, d.items()))/sum(d.values())
# 2.0

If you want to use `NumPy`

You can use np.average here.

np.average([*d.keys()], weights=[*d.values()])
# 2.0

Python get weighted mean of dict keys based on dict values

3 Answers3

Formula:

Pure Python approach.

If you want to use NumPy

If you want to use `NumPy`