Using NumPy to Find Median of Second Element of List of Tuples

Question

Let's say I have a list of tuples, as follows:

list = [(a,1), (b,3), (c,5)]

My goal is to obtain the first element of the median of the list of tuples, using the tuples' second element. In the above case, I would want an output of b, as the median is 3. I tried using NumPy with the following code, to no avail:

import numpy as np

list = [('a',1), ('b',3), ('c',5)]
np.median(list, key=lambda x:x[1])

As a side note, I'd strongly advise you not to name your variable `list`, since this will shadow Python's built-in `list` type — ali_m, Aug 05 '15 at 22:05
@Cleb: Sorry! I actually ended up using your method, and it worked like a charm. Thanks! — Wally, Aug 17 '15 at 19:00

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

You could calculate the median like this:

np.median(dict(list).values()) 
# in Python 2.7; in Python 3.x it would be `np.median(list(dict(list_of_tuples).values()))`

That converts your list to a dictionary first and then calculates the median of its values.

When you want to get the actual key, you can do it like this:

dl = dict(list) #{'a': 1, 'b': 3, 'c': 5}

dl.keys()[dl.values().index(np.median(dl.values()))]

which will print 'b'. That assumes that the median is in the list, if not a ValueError will be thrown. You could therefore then use a try/except like this using the example from @Anand S Kumar's answer:

import numpy as np

l = [('a',1), ('b',3), ('c',5), ('d',22),('e',11),('f',3)]

# l = [('a',1), ('b',3), ('c',5)]

dl = dict(l)
try:
    print(dl.keys()[dl.values().index(np.median(dl.values()))])
except ValueError:
    print('The median is not in this list. Its value is ',np.median(dl.values()))
    print('The closest key is ', dl.keys()[min(dl.values(), key=lambda x:abs(x-np.median(dl.values())))])

For the first list you will then obtain:

The median is not in this list. Its value is 4.0

The closest key is f

for your example it just prints:

b

For Python 3.x , you have to use - `np.median(list(dict(list_of_tuples).values()))` — Anand S Kumar, Aug 05 '15 at 15:36

score 4 · Answer 2 · answered Aug 05 '15 at 15:32

4

np.median does not accept any argument called key . Instead you can use a list comprehension, to take just the second elements from the inner list. Example -

In [3]: l = [('a',1), ('b',3), ('c',5)]

In [4]: np.median([x[1] for x in l])
Out[4]: 3.0

In [5]: l = [('a',1), ('b',3), ('c',5), ('d',22),('e',11),('f',3)]

In [6]: np.median([x[1] for x in l])
Out[6]: 4.0

Also, if its not for example purpose, do not use list as variable name, it shadows the builtin function list .

answered Aug 05 '15 at 15:32

Anand S Kumar

88,551
18
188
176

Thank you for your speedy reply! Unfortunately, the output that I desire is the first element of the median of the second elements. – Wally Aug 05 '15 at 15:35
What if the median is not there in the list? Like in the second example I gave. – Anand S Kumar Aug 05 '15 at 15:36
That's exactly the problem that I am running into. The hard part for me is extracting the first element of the tuple whose second element has the median value. – Wally Aug 05 '15 at 15:39
I am sorry, what exactly do you want? Do you want the middle element of the list , after its sorted based on the `1` index? – Anand S Kumar Aug 05 '15 at 15:40
@Wally My question is basically, what do you want if the median is not in the list? – Anand S Kumar Aug 05 '15 at 15:45
I would want to round up to nearest entry. – Wally Aug 05 '15 at 15:56
2

That seems like a bad design , what if there are multiple elements nearest? Example in above case, there are 2 elements with value `3` , one element with value `5` . Each have a difference of `1` from the median (and are the nearest). – Anand S Kumar Aug 05 '15 at 15:57
In that case, would it be possible to take only the nearest neighbors? – Wally Aug 05 '15 at 16:01
1

Can you explain what you are exactly trying to solve with this? – Anand S Kumar Aug 05 '15 at 16:11

hpaulj · Answer 3 · 2015-08-05T20:39:09.970

2

np.median does not accept some sort of 'key' argument, and does not return the index of what it finds. Also, when there are an even number of items (along the axis), it returns the mean of the 2 center items.

But np.partition, which median uses to find the center items, does take structured array field name(s). So if we turn the list of tuples into a structured array, we can easily select the middle item(s).

The list:

In [1001]: ll
Out[1001]: [('a', 1), ('b', 3), ('c', 5)]

as structured array:

In [1002]: la1 = np.array(ll,dtype='a1,i')
In [1003]: la1
Out[1003]: 
array([(b'a', 1), (b'b', 3), (b'c', 5)], 
     dtype=[('f0', 'S1'), ('f1', '<i4')])

we can get the middle item (1 for size 3) with:

In [1115]: np.partition(la1, (1), order='f1')[[1]]
Out[1115]: 
array([(b'b', 3)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

And allowing for even number of items (with code cribbed from np.median):

def mymedian1(arr, field):
    # return the middle items of arr, selected by field
    sz = arr.shape[0]  # 1d for now
    if sz % 2 == 0:
        ind = ((sz // 2)-1, sz // 2)
    else:
        ind = ((sz - 1) // 2,)
    return np.partition(arr, ind, order=field)[list(ind)]

for the 3 item array:

In [1123]: mymedian1(la1,'f1')
Out[1123]: 
array([(b'b', 3)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

for a 6 item array:

In [1124]: la2
Out[1124]: 
array([(b'a', 1), (b'b', 3), (b'c', 5), (b'd', 22), (b'e', 11), (b'f', 3)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

In [1125]: mymedian1(la2,'f1')
Out[1125]: 
array([(b'f', 3), (b'c', 5)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

See my edit history for an earlier version using np.argpartition.

It even works for the 1st field (the characters):

In [1132]: mymedian1(la2,'f0')
Out[1132]: 
array([(b'c', 5), (b'd', 22)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

edited Aug 05 '15 at 20:39

answered Aug 05 '15 at 16:04

hpaulj

221,503
14
230
353

Interesting idea. What will then be returned for the second example AnandSKuma shows i.e. in the case that the actual median is not in the list? – Cleb Aug 05 '15 at 16:08
`np.median`, in the case of an even length list, returns the `mean` of the two middle values. Hence the value of `4.0` when the 2 values are 3,5. So what is the desired `median` in this case. – hpaulj Aug 05 '15 at 16:39
The `argpartition` route could return the 2 middle tuples, rather than trying to average them. – hpaulj Aug 05 '15 at 16:41
1

In case of having 3 and 5, the desired median would still be 4. But since 4 is not in the list, the appropriate letter cannot be returned. Returning the two middle tuple would be an option (and seems Wally also thinks about this option if I understand his comments above correctly) but this would then require one more check whether the median is in the list or not. But Wally needs to clarify that... BTW: Nice to see np.argpartition at work; have not seen it before. – Cleb Aug 05 '15 at 16:51
Yes, indeed I like the option of returning the two middle tuples. – Wally Aug 05 '15 at 17:37
@Wally: I edited my answer so that now the key of the value which is closest do the median is returned. Let me know whether that suits you. – Cleb Aug 05 '15 at 18:18
`np.partition` accepts a `field` name. So I can use that function directly to find the center item(s) based on a field 'key'. – hpaulj Aug 05 '15 at 20:35

Using NumPy to Find Median of Second Element of List of Tuples

3 Answers3

Linked