Elementwise comparison of numpy arrays of different lengths

Question

This seems like a simple thing to do but I could not figure it out...

first = np.array([0,1,2,0,4,2])
second = np.array([1,2])

I'd like to do element-wise comparison so that the answer will be

array([False, True, True, False, False, True], dtype=bool)

Basically I want it to say True for each of the elements in first which is also in second. So if first has 100 elements, then the output should have 100 elements, too. But I can't figure out how. I've tried using np.equal, np.any, first==np.any(second) to no avail. Of course, I can write a loop to do this but I know there must be a way to do this relatively simple task!

What's the rule that's supposed to give those results? Do you want to repeat the shorter one in a loop or something? And what comparison are you doing between the elements? I can't think of what would make `0 ??? 1` be False but `1 ??? 2` be True. — abarnert, Apr 03 '18 at 22:55
Basically I want it to say True if each of the element in first is in second. So if first has 100 elements, then the output should have 100 elements, too. I've updated the question to clarify this. Any ideas? — puifais, Apr 03 '18 at 22:57

score 5 · Accepted Answer · answered Apr 03 '18 at 23:06

5

What you are asking for is what np.isin does:

>>> import numpy as np
>>> first = np.array([0,1,2,0,4,2])
>>> second = np.array([1,2])
>>> np.isin(first, second)
array([False,  True,  True, False, False,  True])

answered Apr 03 '18 at 23:06

Paul Panzer

51,835
3
54
99

YES! I did not know there is `np.isin`. I've only used in the built-in function `in` for python. thank you so much! – puifais Apr 03 '18 at 23:16
@puifais it's fairly new, added in version `1.13.0`. – Paul Panzer Apr 03 '18 at 23:18

score 0 · Answer 2 · answered Apr 03 '18 at 23:05

It sounds like what you're trying to do is a cartesian product operation—compare every element of first to every element of second. You can do this by lifting second to a 2D array, so you get back a 2D result:

>>> first == second.reshape(2,1)
array([[False,  True, False, False, False, False],
       [False, False,  True, False, False,  True]])

And then you apparently want to flatten that into a single row by running np.any. You can do that by passing an axis argument:

>>> (first == second.reshape(2,1)).any(axis=0)
array([False,  True,  True, False, False,  True])

That matches your desired output, so I think it's what you were asking for?

But of course this takes O(N*M) space (6x2 in your example). That's generally the way of numpy—if you want to do things as fast as possible, you need to build an array big enough to hold all the results at each step. In this case, because you're doing a cartesian product in a single step, that's a 6x2 array.

But quite often, that isn't actually what you want. If you just want to do a in-type search for each value in first against each value in second, just use the isin function, which will do the same work without ever building a 6x2 array. And it's simpler than what you were trying to write, too.

Elementwise comparison of numpy arrays of different lengths

2 Answers2