Take the mean of values in a list if a duplicate is found

Question

I have 2 lists which are associated with each other. E.g., here, 'John' is associated with '1', 'Bob' is associated with 4, and so on:

l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]

My problem is with the duplicate John. Instead of adding the duplicate John, I want to take the mean of the values associated with the Johns, i.e., 1 and 3, which is (3 + 1)/2 = 2. Therefore, I would like the lists to actually be:

l1 = ['John', 'Bob', 'Stew']
l2 = [2, 4, 7]

I have experimented with some solutions including for-loops and the "contains" function, but can't seem to piece it together. I'm not very experienced with Python, but linked lists sound like they could be used for this.

Thank you

@bla yes I did try a dict, but the problem is that since keys can only be unique it doesn't give me a chance to take the mean of the associated values in l2, because it automatically rejects repeated values. — Rushat Rai, Dec 05 '17 at 14:42
@MythicCocoa you could try making a list of values associated with `'John'` and then take the mean as you need. Take a look at this answer for adding multiple values to a same key in a dict: https://stackoverflow.com/a/47620204/3044673 — bla, Dec 05 '17 at 14:45
Also you can use `statistics.mean` (https://docs.python.org/3/library/statistics.html#statistics.mean) so that you don't need to actually implement a `mean` function. — bla, Dec 05 '17 at 14:48

IMCoins · Accepted Answer · 2017-12-07T14:30:48.067

3

I believe you should use a dict. :)

def mean_duplicate(l1, l2):
    ret = {}
    #   Iterating through both lists...
    for name, value in zip(l1, l2):
        if not name in ret:
            #   If the key doesn't exist, create it.
            ret[name] = value
        else:
            #   If it already does exist, update it.
            ret[name] += value

    #   Then for the average you're looking for...
    for key, value in ret.iteritems():
        ret[key] = value / l1.count(key)

    return ret

def median_between_listsElements(l1, l2):
    ret = {}

    for name, value in zip(l1, l2):
        #   Creating key + list if doesn't exist.
        if not name in ret:
            ret[name] = []
        ret[name].append(value)

    for key, value in ret.iteritems():
        ret[key] = np.median(value)

    return ret

l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]

print mean_duplicate(l1, l2)
print median_between_listsElements(l1, l2)
# {'Bob': 4, 'John': 2, 'Stew': 7}
# {'Bob': 4.0, 'John': 2.0, 'Stew': 7.0}

edited Dec 07 '17 at 14:30

answered Dec 05 '17 at 14:53

IMCoins

3,149
1
10
25

AttributeError: 'dict' object has no attribute 'iteritems'? – Rushat Rai Dec 06 '17 at 13:17
@MythicCocoa You might be running python 3.x. The python 3 equivalent is `.items()`. – IMCoins Dec 06 '17 at 13:28
Good to know :). Elegant solution, worked perfectly, thank you! – Rushat Rai Dec 06 '17 at 13:34
Hi, curious to know if taking the median instead of the mean will make this code more complex – Rushat Rai Dec 07 '17 at 12:23
Why would it be ? It would be a little (little) bit more complex though. What have you tried ? – IMCoins Dec 07 '17 at 12:53
I added this at the end, instead of the average (sorry about the bad comment formatting): for key, value in ret.items(): if l1.count(key) > 1: key_values = [] key_values.append(value) ret[key] = np.median(key_values) – Rushat Rai Dec 07 '17 at 12:57
I think perhaps instead of ret[name] += value, the code should be adding the value to a list of previous values associated with the key (as suggested by bla in the comments), but I'm not very sure. – Rushat Rai Dec 07 '17 at 13:00
I've found a solution for implementing the median, its all good :). Thanks once again – Rushat Rai Dec 07 '17 at 13:43
As you wanted to use np.median to make your median, I updated my answer to give you what I would have done. – IMCoins Dec 07 '17 at 14:29
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/160723/discussion-between-imcoins-and-mythic-cocoa). – IMCoins Dec 07 '17 at 14:31

score 1 · Answer 2 · answered Dec 05 '17 at 14:48

The following might give you an idea. It uses an OrderedDict assuming that you want the items in the order of appearance from the original list:

from collections import OrderedDict

d = OrderedDict()
for x, y in zip(l1, l2):
    d.setdefault(x, []).get(x).append(y)
# OrderedDict([('John', [1, 3]), ('Bob', [4]), ('Stew', [7])])


names, values = zip(*((k, sum(v)/len(v)) for k, v in d.items()))
# ('John', 'Bob', 'Stew')
# (2.0, 4.0, 7.0)

score 0 · Answer 3 · answered Dec 05 '17 at 14:56

Here is a shorter version using dict,

final_dict = {}
l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]

for i in range(len(l1)):
    if final_dict.get(l1[i]) == None:
        final_dict[l1[i]] = l2[i]
    else:
        final_dict[l1[i]] = int((final_dict[l1[i]] + l2[i])/2)


print(final_dict)

score 0 · Answer 4 · answered Dec 05 '17 at 15:07

Something like this:

#!/usr/bin/python
l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]
d={}
for i in range(0, len(l1)):
    key = l1[i]
    if d.has_key(key):
         d[key].append(l2[i])
    else:
         d[key] = [l2[i]]
r = []
for values in d.values():
    r.append((key,sum(values)/len(values)))
print r

score 0 · Answer 5 · answered Dec 05 '17 at 15:10

Hope following code helps

l1 = ['John', 'Bob', 'Stew', 'John']
l2 = [1, 4, 7, 3]

def remove_repeating_names(names_list, numbers_list):
    new_names_list = []
    new_numbers_list = []
    for first_index, first_name in enumerate(names_list):
        amount_of_occurencies = 1
        number = numbers_list[first_index]
        for second_index, second_name in enumerate(names_list):
            # Check if names match and
            # if this name wasn't read in earlier cycles or is not same element.
            if (second_name == first_name):
                if (first_index < second_index):
                    number += numbers_list[second_index]
                    amount_of_occurencies += 1
            # Break the loop if this name was read earlier.
                elif (first_index > second_index):
                    amount_of_occurencies = -1
                    break
        if amount_of_occurencies is not -1:
            new_names_list.append(first_name)
            new_numbers_list.append(number/amount_of_occurencies)
    return [new_names_list, new_numbers_list]

# Unmodified arrays
print(l1)
print(l2)

l1, l2 = remove_repeating_names(l1, l2)

# If you want numbers list to be integer, not float, uncomment following line:
# l2 = [int(number) for number in l2]

# Modified arrays
print(l1)
print(l2)

Take the mean of values in a list if a duplicate is found

5 Answers5