1

I can't seem to find a solution for this. Given two theano tensors a and b, I want to find the indices of elements in b within the tensor a. This example will help, say a = [1, 5, 10, 17, 23, 39] and b = [1, 10, 39], I want the result to be the indices of the b values in tensor a, i.e. [0, 2, 5].

After spending some time, I thought the best way would be to use scan; here is my shot at the minimal example.

def getIndices(b_i, b_v, ar):
    pI_subtensor = pI[b_i]
    return T.set_subtensor(pI_subtensor, np.where(ar == b_v)[0])

ar = T.ivector()
b = T.ivector()
pI = T.zeros_like(b)

result, updates = theano.scan(fn=getIndices,
                              outputs_info=None,
                              sequences=[T.arange(b.shape[0], dtype='int32'), b],
                              non_sequences=ar)

get_proposal_indices = theano.function([b, ar], outputs=result)

d = get_proposal_indices( np.asarray([1, 10, 39], dtype=np.int32), np.asarray([1, 5, 10, 17, 23, 39], dtype=np.int32) )

I am getting the error:

TypeError: Trying to increment a 0-dimensional subtensor with a 1-dimensional value.

in the return statement line. Further, the output needs to be a single tensor of shape b and I am not sure if this would get the desired result. Any suggestion would be helpful.

baskaran
  • 25
  • 1
  • 7
  • 2
    What is the motivation for using Theano to do this instead of something like numpy? How many entries might `a` and `b` have in practice? Will every element of `b` always appear somewhere in `a`? If not, how should missing entries be handled? Are `a` and `b` always in sorted order? – Daniel Renshaw Aug 26 '15 at 06:14
  • Tensor `a` could be moderately big (15-20k) but `b` is usually small around hundred (think of it as minibatch items) and `b` is strictly a subset of `a` and so there will not be any missing entries. Both are not sorted currently, but I can have `a` sorted easily (but not `b`). As for the motivation, there is no straight forward function in numpy (to the extent I tried finding) and I need to do looping there anyways. – baskaran Aug 26 '15 at 11:04

1 Answers1

1

It all depends on how big your arrays will be. As long as it fits in memory you can proceed as follows

import numpy as np
import theano
import theano.tensor as T

aa = T.ivector()
bb = T.ivector()

equality = T.eq(aa, bb[:, np.newaxis])
indices = equality.nonzero()[1]

f = theano.function([aa, bb], indices)

a = np.array([1, 5, 10, 17, 23, 39], dtype=np.int32)
b = np.array([1, 10, 39], dtype=np.int32)

f(a, b)

# outputs [0, 2, 5]
eickenberg
  • 14,152
  • 1
  • 48
  • 52