theano finding the indices of a tensor elements in a second tensor

Question

I can't seem to find a solution for this. Given two theano tensors a and b, I want to find the indices of elements in b within the tensor a. This example will help, say a = [1, 5, 10, 17, 23, 39] and b = [1, 10, 39], I want the result to be the indices of the b values in tensor a, i.e. [0, 2, 5].

After spending some time, I thought the best way would be to use scan; here is my shot at the minimal example.

def getIndices(b_i, b_v, ar):
    pI_subtensor = pI[b_i]
    return T.set_subtensor(pI_subtensor, np.where(ar == b_v)[0])

ar = T.ivector()
b = T.ivector()
pI = T.zeros_like(b)

result, updates = theano.scan(fn=getIndices,
                              outputs_info=None,
                              sequences=[T.arange(b.shape[0], dtype='int32'), b],
                              non_sequences=ar)

get_proposal_indices = theano.function([b, ar], outputs=result)

d = get_proposal_indices( np.asarray([1, 10, 39], dtype=np.int32), np.asarray([1, 5, 10, 17, 23, 39], dtype=np.int32) )

I am getting the error:

TypeError: Trying to increment a 0-dimensional subtensor with a 1-dimensional value.

in the return statement line. Further, the output needs to be a single tensor of shape b and I am not sure if this would get the desired result. Any suggestion would be helpful.

What is the motivation for using Theano to do this instead of something like numpy? How many entries might `a` and `b` have in practice? Will every element of `b` always appear somewhere in `a`? If not, how should missing entries be handled? Are `a` and `b` always in sorted order? — Daniel Renshaw, Aug 26 '15 at 06:14
Tensor `a` could be moderately big (15-20k) but `b` is usually small around hundred (think of it as minibatch items) and `b` is strictly a subset of `a` and so there will not be any missing entries. Both are not sorted currently, but I can have `a` sorted easily (but not `b`). As for the motivation, there is no straight forward function in numpy (to the extent I tried finding) and I need to do looping there anyways. — baskaran, Aug 26 '15 at 11:04

score 1 · Accepted Answer · answered Aug 26 '15 at 10:58

It all depends on how big your arrays will be. As long as it fits in memory you can proceed as follows

import numpy as np
import theano
import theano.tensor as T

aa = T.ivector()
bb = T.ivector()

equality = T.eq(aa, bb[:, np.newaxis])
indices = equality.nonzero()[1]

f = theano.function([aa, bb], indices)

a = np.array([1, 5, 10, 17, 23, 39], dtype=np.int32)
b = np.array([1, 10, 39], dtype=np.int32)

f(a, b)

# outputs [0, 2, 5]

theano finding the indices of a tensor elements in a second tensor

1 Answers1