0

I have a list of three item tuples. The first two items are often duplicates (GPS co-ordinates) while the last item is a score (signal strength)

[(62.45807, -114.41026, 8),
(62.45807, -114.41026, 11),
(62.45807, -114.41026, 18),
(62.45807, -114.41026, 16),
(62.45807, -114.41026, 9),
(62.45785, -114.41003, 23),
(62.45785, -114.41003, 19),
(62.45785, -114.41003, 11),
(62.45785, -114.41003, 17),
(62.45785, -114.41003, 14),
(62.45785, -114.41003, 11),
(62.45785, -114.41003, 15),
(62.45765, -114.40978, 28),
(62.45765, -114.40978, 16),
(62.45765, -114.40978, 10),
(62.45765, -114.40978, 15),
(62.45765, -114.40978, 25)]

I would like to know how to remove the duplicate GPS co-ordinates while preferring the highest score to end up with this:

[(62.45807, -114.41026, 18),
(62.45785, -114.41003, 23),
(62.45765, -114.40978, 28)]

And how to do the same but average the scores to end up with something like this

[(62.45807, -114.41026, 12),
(62.45785, -114.41003, 16),
(62.45765, -114.40978, 19)]
Cœur
  • 37,241
  • 25
  • 195
  • 267
  • How have you tried to solve this problem? – APerson Sep 04 '14 at 13:15
  • pandas has functions you want. The similar question here: http://stackoverflow.com/questions/12497402/python-pandas-remove-duplicates-by-columns-a-keeping-the-row-with-the-highest – Vicky Liau Sep 04 '14 at 13:22
  • How is the answer 'too broad', please? I provided sample input, expected output and described the conditions to get from one to the other. I also got a prompt answer. I would like to understand how this question could be made better for future reference. Thanks. – user3481267 Sep 04 '14 at 16:12

2 Answers2

2

Sounds like a job for itertools.groupby:

>>> from itertools import groupby

Max:

>>> [max(g, key=lambda x:x[-1]) for k, g in groupby(data, key= lambda x:x[:2])]
[(62.45807, -114.41026, 18),
 (62.45785, -114.41003, 23),
 (62.45765, -114.40978, 28)]

Average:

>>> [a + (round(sum(c for _, _, c in b)/float(len(b))),) 
                        for a, b in ((k, list(g)) for k, g in 
                                           groupby(data, key= lambda x:x[:2]))]
[(62.45807, -114.41026, 12.0),
 (62.45785, -114.41003, 16.0),
 (62.45765, -114.40978, 19.0)]
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
0

You could make a function to map each value into a dictionary with a key as the GPS co-ordinates, where the value is a list of scores

def create_gps_score_dict(gps_score_list):
    gps_score_dict = {}
    for gps_score in gps_score_list:
        if (gps_score[0], gps_score[1]) in gps_score_dict.keys():
            gps_score_dict[(gps_score[0], gps_score[1])].append(gps_score[2])
        else:
            gps_score_dict[(gps_score[0], gps_score[1])] = [gps_score[2]]
    return gps_score_dict

Now you can generate results looking at this simple dictionary.

def max_gps_scores(gps_score_dict):
    gps_score_list = []
    for gps, score in gps_score_dict.items():
        gps_score_list.append((gps[0], gps[1], max(score))

Example

>>> gps_score_list=[(62.45807, -114.41026, 8),
    (62.45807, -114.41026, 11),
    (62.45807, -114.41026, 18),
    (62.45807, -114.41026, 16),
    (62.45807, -114.41026, 9),
    (62.45785, -114.41003, 23),
    (62.45785, -114.41003, 19),
    (62.45785, -114.41003, 11),
    (62.45785, -114.41003, 17),
    (62.45785, -114.41003, 14),
    (62.45785, -114.41003, 11),
    (62.45785, -114.41003, 15),
    (62.45765, -114.40978, 28),
    (62.45765, -114.40978, 16),
    (62.45765, -114.40978, 10),
    (62.45765, -114.40978, 15),
    (62.45765, -114.40978, 25)]

>>> max_gps_scores(create_gps_score_dict(gps_score_list))
[(62.45807, -114.41026, 18), (62.45765, -114.40978, 28), (62.45785, -114.41003,23)]

I'll leave average up to you!

flakes
  • 21,558
  • 8
  • 41
  • 88