Add probabilities and get id with the highest probability

Question

I have the following structure:

('2', 0.30334973335266113)
('4', 0.43178531527519226)
('3', 0.3627113997936249)
('9', 0.5691161155700684)
('2', 0.4603477120399475)
('2', 0.7340329885482788)
('10', 0.4691111445426941)
('13', 0.20860238373279572)
('3', 0.4541565775871277)
('2', 0.4479588568210602)
('2', 0.6090611815452576)
('16', 0.5154575705528259)
('11', 0.4370063543319702)
('12', 0.38097500801086426)
('14', 0.23826521635055542)
('3', 0.39956724643707275)
('12', 0.291579008102417)
('11', 0.4514589309692383)

I want to get an output that adds the probabilities of each of the id there and return the one that has the highest score.

For instance for 3 and 10:

(3, 1.2)
(10, 0.46)

The return should be: (3, 1.2)

summed = {}
lis = [('2', 0.30334973335266113),
('4', 0.43178531527519226),
('3', 0.3627113997936249),
('9', 0.5691161155700684),
('2', 0.4603477120399475),
('2', 0.7340329885482788),
('10', 0.4691111445426941),
('13', 0.20860238373279572),
('3', 0.4541565775871277),
('2', 0.4479588568210602),
('2', 0.6090611815452576),
('16', 0.5154575705528259),
('11', 0.4370063543319702),
('12', 0.38097500801086426),
('14', 0.23826521635055542),
('3', 0.39956724643707275),
('12', 0.291579008102417),
('11', 0.4514589309692383)]

for i in lis:
    summed[str(i[0])] = i[1]

This though overrides the keys so only the last seen key and its value gets stored. When inserted a new key I don't want it to be overwritten, I want the value to be added to the existing key.

What have you attempted so far? What is your data stored in? — Mad Physicist, Dec 06 '18 at 13:45
see https://stackoverflow.com/questions/13145368/find-the-maximum-value-in-a-list-of-tuples-in-python — umeli, Dec 06 '18 at 13:46
It's currently computed dynamically each line. Not sure what's the most efficient way to handle this type of data sets, lists, or key value maps — bytebiscuit, Dec 06 '18 at 13:46
The most efficient way would be to show what you have and ask specific questions step by step when you run into problems. Given your last comment, it appears that you don't have a data structure in mind, much less being at the point of selecting algorithms for it. I would recommend researching your options for data structure (as you've already begun to do), and maybe ask a question about that first. — Mad Physicist, Dec 06 '18 at 13:53

tobias_k · Accepted Answer · 2018-12-06T14:15:29.850

You can collect the values in a collections.Counter and then get the most_common(1):

>>> lst = [('2', 0.30334973335266113),..., ('11', 0.4514589309692383)]
>>> c = collections.Counter()
>>> for x,y in lst: c[x] += y
>>> c.most_common(1)
[('2', 2.554750472307205)]

About your existing code: You are not actually summing, but just overwriting the previous value, if any, with =. Intead, you should use += and initialize the value with 0 if it does not exist yet. (collections.Counter will do this automatically) Then use max to get the max element.

for x, y in lis:
    if x not in summed:
        summed[x] = 0
    summed[x] += y
print(max(summed.items(), key=lambda t: t[1]))

Add probabilities and get id with the highest probability

1 Answers1