Numpy "inner join" with repeating key

Question

I was trying to lookup the values in an array using "keys" in another array. Unfortunately due to the "space" of keys is too large (but sparse), I cannot convert this to an index trick (by using array as index).

I found the "undocumented" function np.lib.recfunctions.join_by which more or less allows me to select by "key" instead of index, here is an example:

import numpy as np
from numpy.lib import recfunctions  # necessary!

>>> a = np.array([100,200,500,700,200,500,100,700,200], dtype=[('key','i')])
array([(100,), (200,), (500,), (700,), (200,), (500,), (100,), (700,),
       (200,)], dtype=[('key', '<i4')])

>>> b = np.array([(100,10),(200,20),(500,50),(700,70)], dtype=[('key','i'),('value','i')])
array([(100, 10), (200, 20), (500, 50), (700, 70)],
      dtype=[('key', '<i4'), ('value', '<i4')])

>>> np.lib.recfunctions.join_by('key', a, b, usemask=False)
array([(100,     10), (200,     20), (200, 999999), (500,     50),
       (500, 999999), (500, 999999), (700,     70), (700, 999999),
       (700, 999999)], dtype=[('key', '<i4'), ('value', '<i4')])

It turns out this function does not support repeating "keys". I'm wondering how can I get a result like

array([(100,10),(200,20),(500,50),(700,70),(200,20),(500,50),(100,10)...])

which exactly follow the order of a and have an extra column of lookup result. Any help is appreciated!

ZisIsNotZis · Answer 1 · 2019-02-22T02:32:18.960

I found I can re-map all possible keys to an index. For the a and b above, assume b is sorted, I can use

indexOfAInB = np.unique(np.r_[a['key'],b['key']], return_inverse=True)[1][:len(a)]

to get the mapping. I can then use index trick:

B['value'][indexOfAInB]

to get the "look-up result".

Since np.unique internally sorts, I believe this will take like O(nlogn) time. Still, any suggestion about a better way is welcomed!

BTW, this method do not support key in a that is not in b

Numpy "inner join" with repeating key

1 Answers1