De-duplicating list of tuples, preferring certain ones

Question

I have a list of three item tuples. The first two items are often duplicates (GPS co-ordinates) while the last item is a score (signal strength)

[(62.45807, -114.41026, 8),
(62.45807, -114.41026, 11),
(62.45807, -114.41026, 18),
(62.45807, -114.41026, 16),
(62.45807, -114.41026, 9),
(62.45785, -114.41003, 23),
(62.45785, -114.41003, 19),
(62.45785, -114.41003, 11),
(62.45785, -114.41003, 17),
(62.45785, -114.41003, 14),
(62.45785, -114.41003, 11),
(62.45785, -114.41003, 15),
(62.45765, -114.40978, 28),
(62.45765, -114.40978, 16),
(62.45765, -114.40978, 10),
(62.45765, -114.40978, 15),
(62.45765, -114.40978, 25)]

I would like to know how to remove the duplicate GPS co-ordinates while preferring the highest score to end up with this:

[(62.45807, -114.41026, 18),
(62.45785, -114.41003, 23),
(62.45765, -114.40978, 28)]

And how to do the same but average the scores to end up with something like this

[(62.45807, -114.41026, 12),
(62.45785, -114.41003, 16),
(62.45765, -114.40978, 19)]

pandas has functions you want. The similar question here: http://stackoverflow.com/questions/12497402/python-pandas-remove-duplicates-by-columns-a-keeping-the-row-with-the-highest — Vicky Liau, Sep 04 '14 at 13:22
How is the answer 'too broad', please? I provided sample input, expected output and described the conditions to get from one to the other. I also got a prompt answer. I would like to understand how this question could be made better for future reference. Thanks. — user3481267, Sep 04 '14 at 16:12

score 2 · Accepted Answer · answered Sep 04 '14 at 13:25

Sounds like a job for itertools.groupby:

>>> from itertools import groupby

Max:

>>> [max(g, key=lambda x:x[-1]) for k, g in groupby(data, key= lambda x:x[:2])]
[(62.45807, -114.41026, 18),
 (62.45785, -114.41003, 23),
 (62.45765, -114.40978, 28)]

Average:

>>> [a + (round(sum(c for _, _, c in b)/float(len(b))),) 
                        for a, b in ((k, list(g)) for k, g in 
                                           groupby(data, key= lambda x:x[:2]))]
[(62.45807, -114.41026, 12.0),
 (62.45785, -114.41003, 16.0),
 (62.45765, -114.40978, 19.0)]

Thank you! This is concise and does the trick. – user3481267 Sep 04 '14 at 16:09 — user3481267, Sep 04 '14 at 16:09

score 0 · Answer 2 · answered Sep 04 '14 at 13:29

You could make a function to map each value into a dictionary with a key as the GPS co-ordinates, where the value is a list of scores

def create_gps_score_dict(gps_score_list):
    gps_score_dict = {}
    for gps_score in gps_score_list:
        if (gps_score[0], gps_score[1]) in gps_score_dict.keys():
            gps_score_dict[(gps_score[0], gps_score[1])].append(gps_score[2])
        else:
            gps_score_dict[(gps_score[0], gps_score[1])] = [gps_score[2]]
    return gps_score_dict

Now you can generate results looking at this simple dictionary.

def max_gps_scores(gps_score_dict):
    gps_score_list = []
    for gps, score in gps_score_dict.items():
        gps_score_list.append((gps[0], gps[1], max(score))

Example

>>> gps_score_list=[(62.45807, -114.41026, 8),
    (62.45807, -114.41026, 11),
    (62.45807, -114.41026, 18),
    (62.45807, -114.41026, 16),
    (62.45807, -114.41026, 9),
    (62.45785, -114.41003, 23),
    (62.45785, -114.41003, 19),
    (62.45785, -114.41003, 11),
    (62.45785, -114.41003, 17),
    (62.45785, -114.41003, 14),
    (62.45785, -114.41003, 11),
    (62.45785, -114.41003, 15),
    (62.45765, -114.40978, 28),
    (62.45765, -114.40978, 16),
    (62.45765, -114.40978, 10),
    (62.45765, -114.40978, 15),
    (62.45765, -114.40978, 25)]

>>> max_gps_scores(create_gps_score_dict(gps_score_list))
[(62.45807, -114.41026, 18), (62.45765, -114.40978, 28), (62.45785, -114.41003,23)]

I'll leave average up to you!

De-duplicating list of tuples, preferring certain ones

2 Answers2