Let's say that I have a list of things, and their frequency, (sorted by frequency) and the total number of items (I use a dict here for clarity, but actually they are objects with a frequency property):
items = {"bananas":12, "oranges":12, "apples":11, "pears":2}
Now, I want to pick out 10 items (max_results
) out my my 37 (total_frequency
) items, but in proportion to their frequency (with a maximum, of say, 3 of any item - max_proportion
). In this example I'd end up with 3 each of bananas, oranges, and apples, and 1 pear.
def get_relative_quantities(total_frequency, items, max_results, max_proportion):
results = {}
num_added = 0
for freq, the_group in it.groupby(items, lambda x: x.frequency):
if num_added == max_results:
break
the_group_list = list(the_group)
group_size = len(the_group_list)
shuffle(the_group_list)
for item in the_group_list:
if num_added == max_results:
break
rel_freq = min(math.ceil((freq/total_frequency)*max_results), max_proportion)
results[item] = rel_freq
num_added += rel_freq
return results
One thing I'm worried about is that with this approach if there is only 1 item, I won't get enough results. I'll just get 3 (assuming a max_proportion
f 3 out of 10). How can I approach that problem?