np.median
does not accept some sort of 'key' argument, and does not return the index of what it finds. Also, when there are an even number of items (along the axis), it returns the mean of the 2 center items.
But np.partition
, which median
uses to find the center items, does take structured array field name(s). So if we turn the list of tuples into a structured array, we can easily select the middle item(s).
The list:
In [1001]: ll
Out[1001]: [('a', 1), ('b', 3), ('c', 5)]
as structured array:
In [1002]: la1 = np.array(ll,dtype='a1,i')
In [1003]: la1
Out[1003]:
array([(b'a', 1), (b'b', 3), (b'c', 5)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
we can get the middle item (1
for size 3) with:
In [1115]: np.partition(la1, (1), order='f1')[[1]]
Out[1115]:
array([(b'b', 3)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
And allowing for even number of items (with code cribbed from np.median
):
def mymedian1(arr, field):
# return the middle items of arr, selected by field
sz = arr.shape[0] # 1d for now
if sz % 2 == 0:
ind = ((sz // 2)-1, sz // 2)
else:
ind = ((sz - 1) // 2,)
return np.partition(arr, ind, order=field)[list(ind)]
for the 3 item array:
In [1123]: mymedian1(la1,'f1')
Out[1123]:
array([(b'b', 3)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
for a 6 item array:
In [1124]: la2
Out[1124]:
array([(b'a', 1), (b'b', 3), (b'c', 5), (b'd', 22), (b'e', 11), (b'f', 3)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
In [1125]: mymedian1(la2,'f1')
Out[1125]:
array([(b'f', 3), (b'c', 5)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
See my edit history for an earlier version using np.argpartition
.
It even works for the 1st field (the characters):
In [1132]: mymedian1(la2,'f0')
Out[1132]:
array([(b'c', 5), (b'd', 22)],
dtype=[('f0', 'S1'), ('f1', '<i4')])