How does contains work for ndarrays?

Question

>>> x = numpy.array([[1, 2],
...                  [3, 4],
...                  [5, 6]])
>>> [1, 7] in x
True
>>> [1, 2] in x
True
>>> [1, 6] in x
True
>>> [2, 6] in x
True
>>> [3, 6] in x
True
>>> [2, 3] in x
False
>>> [2, 1] in x
False
>>> [1, 2, 3] in x
False
>>> [1, 3, 5] in x
False

I have no idea how __contains__ works for ndarrays. I couldn't find the relevant documentation when I looked for it. How does it work? And is it documented anywhere?

@Marcin: The source is buried somewhere in a pile of C that I don't understand the structure to. A big part of it is even autogenerated, and a lot of it is duplicated to handle different dtypes and other differences. I'm not going to dig through all that if I don't have to. — user2357112, Aug 19 '13 at 18:41
http://www.mail-archive.com/numpy-discussion@scipy.org/msg31578.html seems to have the answer. — Alok Singhal, Aug 19 '13 at 18:47
@AlokSinghal: Further experimentation seems to agree with that post. `[1, object()] in x` and `[object(), 4] in x` report `True`, but `[2, object()] in x` and `[object(), 5] in x` report `False`, and iterating over `itertools.product(xrange(1, 7), repeat=2)` and checking containment for all pairs gives the expected results. I was really hoping for something better than a mailing list archive, but if that's all there is, I'll take it. — user2357112, Aug 19 '13 at 18:59
@user2357112 I just posted this as an answer since that's the correct answer and hopefully it will help other people who discover the same issue. — Alok Singhal, Aug 19 '13 at 19:02

user2357112 · Accepted Answer · 2020-02-20T10:38:02.413

I found the source for ndarray.__contains__, in numpy/core/src/multiarray/sequence.c. As a comment in the source states,

thing in x

is equivalent to

(x == thing).any()

for an ndarray x, regardless of the dimensions of x and thing. This only makes sense when thing is a scalar; the results of broadcasting when thing isn't a scalar cause the weird results I observed, as well as oddities like array([1, 2, 3]) in array(1) that I didn't think to try. The exact source is

static int
array_contains(PyArrayObject *self, PyObject *el)
{
    /* equivalent to (self == el).any() */

    int ret;
    PyObject *res, *any;

    res = PyArray_EnsureAnyArray(PyObject_RichCompare((PyObject *)self,
                                                      el, Py_EQ));
    if (res == NULL) {
        return -1;
    }
    any = PyArray_Any((PyArrayObject *)res, NPY_MAXDIMS, NULL);
    Py_DECREF(res);
    ret = PyObject_IsTrue(any);
    Py_DECREF(any);
    return ret;
}

score 6 · Answer 2 · answered Aug 19 '13 at 19:01

6

Seems like numpy's __contains__ is doing something like this for a 2-d case:

def __contains__(self, item):
    for row in self:
        if any(item_value == row_value for item_value, row_value in zip(item, row)):
            return True
    return False

[1,7] works because the 0th element of the first row matches the 0th element of [1,7]. Same with [1,2] etc. With [2,6], the 6 matches the 6 in the last row. With [2,3], none of the elements match a row at the same index. [1, 2, 3] is trivial since the shapes don't match.

See this for more, and also this ticket.

answered Aug 19 '13 at 19:01

Alok Singhal

93,253
21
125
158

It seems to me that `all` would be more practically useful than `any`, I wonder why `numpy` developers chose this implementation of `__contains__`. – Akavall Aug 20 '13 at 03:13
1

@Akavall Seems to be for compatibility with Numeric. In Numeric, an array's boolean value was assumed to be `True` if it contained at least one non-zero element. Numpy raises exceptions when one tries to use an array as a boolean, saying that one should use `any()` or `all()`. In this case though, `__contains__()` API is forcing Numpy to interpret an array in a boolean context, and for this, they decided to go with what Numeric did. But I agree, it's really confusing and I don't know if *anyone* depends on this behavior of `__contains__()`. – Alok Singhal Aug 20 '13 at 04:03

Markus Dutschke · Answer 3 · 2021-03-16T13:42:20.210

how to check is a 1 dimensional `np.ndarray` is equivalent to a row in a 2 dimensional `np.ndarray`

As pointed out already,

[1, 2] in x is equivalent to ([1, 2] == x).any().

[1,2,3] in x nowadays throws a DeprecationWarning, as it is 3 elements long, while x.shape[1] is only 2.

If you just want to find out if an np.ndarray is just contained (in the human interpreted way) in an other np.ndarray, use this

>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> np.any([np.array_equal([1, 7], el) for el in list(x)])
False
>>> np.any([np.array_equal([1, 2], el) for el in list(x)])
True

How does contains work for ndarrays?

3 Answers3

how to check is a 1 dimensional `np.ndarray` is equivalent to a row in a 2 dimensional `np.ndarray`

Linked

Related

How does __contains__ work for ndarrays?

3 Answers3

how to check is a 1 dimensional np.ndarray is equivalent to a row in a 2 dimensional np.ndarray

Linked

Related

How does contains work for ndarrays?

how to check is a 1 dimensional `np.ndarray` is equivalent to a row in a 2 dimensional `np.ndarray`