4

I can index my numpy array / pytorch tensor with a boolean array/tensor of the same shape or an array/tensor containing integer indexes of the elements I'm after. Which is faster?

drevicko
  • 14,382
  • 15
  • 75
  • 97

2 Answers2

3

The following tests indicate that it's generally 3x to 20x faster with an index array in both numpy and pytorch:

In [1]: a = torch.arange(int(1e5))
idxs = torch.randint(len(a), (int(1e4),))
ind = torch.zeros_like(a, dtype=torch.uint8)
ind[idxs] = 1
ac, idxsc, indc = a.cuda(), idxs.cuda(), ind.cuda()

In [2]: %timeit a[idxs]
73.4 µs ± 1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: %timeit a[ind]
622 µs ± 8.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [4]: %timeit ac[idxsc]
9.51 µs ± 475 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit ac[indc]
59.6 µs ± 313 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: idxs = torch.arange(len(a)-1, dtype=torch.long)
ind = torch.zeros_like(a, dtype=torch.uint8)
ind[idxs] = 1
ac, idxsc, indc = a.cuda(), idxs.cuda(), ind.cuda()

In [7]: %timeit a[idxs]
146 µs ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: %timeit a[ind]
4.59 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [9]: %timeit ac[idxsc]
33 µs ± 15.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [10]: %timeit ac[indc]
85.9 µs ± 56.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
drevicko
  • 14,382
  • 15
  • 75
  • 97
  • It would also be interesting to consider the values of such runs on a purely CPU-based implementation. Do you expect similar results there? – dennlinger Sep 04 '19 at 09:34
  • 1
    A `numpy` test would use dtype `bool`. My experience has been that the boolean indexing is slightly slower, consistent with first converting the boolean with `np.nonzero`. – hpaulj Sep 04 '19 at 15:05
  • 1
    Correction, I get 10x speed difference for a similar size problem. It is consistent with applying `nonzero` to the boolean. – hpaulj Sep 04 '19 at 22:20
  • 2
    @dennlinger You mean pytorch running only on CPU? I'd expect it to be more or less identical to numpy, since pytorch stores the data in numpy format and presumably uses numpy binaries for it's calculations also (no reason to re-invent that already well greased wheel :). – drevicko Sep 08 '19 at 10:50
0

As the prior solution shows, I would expect the integer based indexing to be faster, since the dimension of the output tensor equals the indexing tensors dimension, which makes memory allocation easier.