4

I am trying to write code to find the mean of the keys in my dict, but based on the dict values. So, for example, for:

d = {1:2, 2:1, 3:2}

the dict keys would be:

[1,1,2,3,3]

I've written the following code, which works for small data sets such as the above:

def get_median_of_dict_keys(d: dict) -> float:
    nums_list = []
    for k,v in d.items():
        if type(v) != int:
            raise TypeError
        nums_list.extend([k] * v)
    
    median = sum(nums_list) / len(nums_list)
    return median

This gets me the values I want when the data set is small, but if the data set is something like:

d = {1:1_000_000_000_000_000, 2:2_000, 3:1_000_000_000_000_000}

I get an out of memory error which, now that I think about it, makes sense.

So how can I structure the above function in a way that will also handle those larger data sets? Thanks for your time.

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
J. B.
  • 155
  • 1
  • 8

3 Answers3

1

You do not need to create a list, just keep two running variables, one holding the total sum and the other one holding the number of elements:

def get_mean_of_dict_keys(d: dict) -> float:
    total = 0
    count = 0
    for k, v in d.items():
        total += k * v
        count += v

    mean = total / count
    return mean


print(get_mean_of_dict_keys({1: 2, 2: 1, 3: 2}))

Output

2.0
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • Dani, thank you so much! I actually started thinking about this a bit after I asked the question. I appreciate your help! – J. B. Nov 14 '20 at 18:29
1

If you want the mean

this is perfectly attainable with larger numbers:

import numpy as np
d = {1:2000000000, 2:1000, 3:2000000000}
print(np.mean([i*d[i] for i in d]))

output

2666667333.3333335

breakdown

[i*d[i] for i in d]

# is equivalent to:

lst = []
for i in d:
    lst.append(i*d[i])
Ironkey
  • 2,568
  • 1
  • 8
  • 30
1

What you want to find is weighted average.

Formula:

enter image description here

Where,

  • X1..n are keys in your dictionary.
  • W1..n are values in your dictionary.
  • is weighted average.

Pure Python approach.

Using itertools.starmap with operator.mul

from itertools import starmap
from operator import mul
d = {1:2, 2:1, 3:2}
sum(starmap(mul, d.items()))/sum(d.values())
# 2.0

If you want to use NumPy

You can use np.average here.

np.average([*d.keys()], weights=[*d.values()])
# 2.0
Ch3steR
  • 20,090
  • 4
  • 28
  • 58