0

Given tensors x and y, each with shape (num_batches, d), how can I use PyTorch to compute the sum of every combination of x and y within a batch?

This is similar to outer product, except we don't want to multiply, but sum. (This implies that I could solve this by exponentiating, outer product, and taking the log, but of course that has numerical and performance disadvantages).

It could be done via cartesian product and then summing each of the combinations.

Essentially, I'd like osum[b, i, j] == x[b, i] + y[b, j]. Can PyTorch do this in tensors, without loops?

Shai
  • 111,146
  • 38
  • 238
  • 371
SRobertJames
  • 8,210
  • 14
  • 60
  • 107
  • Related to [tag:numpy-einsum], you would be able to use einsum if the operation was of the form `osum[b, i, j] = x[b, i] * y[b, j]`... Ein-summation accumulates (*i.e.* sums) element wise products, not sums. – Ivan Jul 13 '22 at 06:26

2 Answers2

1

You can perform such operation using broadcasting:

>>> x = torch.randint(0,10,(2,4))
tensor([[0, 6, 5, 8],
        [3, 0, 7, 5]])
>>> y = torch.randint(0,10,(2,5))
tensor([[6, 9, 9, 8, 7],
        [0, 4, 6, 2, 5]])

>>> x[:,:,None].shape
(2, 4, 1)

>>> y[:,None].shape
(2, 1, 5])

Adding a singleton to dimensions which differ ensures the 'outer' operation is performed.

>>> osum = x[:,:,None] + y[:,None]
tensor([[[ 6,  9,  9,  8,  7],
         [12, 15, 15, 14, 13],
         [11, 14, 14, 13, 12],
         [14, 17, 17, 16, 15]],

        [[ 3,  7,  9,  5,  8],
         [ 0,  4,  6,  2,  5],
         [ 7, 11, 13,  9, 12],
         [ 5,  9, 11,  7, 10]]])
Ivan
  • 34,531
  • 8
  • 55
  • 100
1

This can easily be done, by introducing singleton dimensions into x and y and broadcasting along these singleton dimensions:

osum = x[..., None] + y[:, None, :]

For example:

x = torch.arange(6).view(2,3)
y = x * 10
osum = x[..., None] + y[:, None, :]

Results with:

tensor([[[ 0, 10, 20],
         [ 1, 11, 21],
         [ 2, 12, 22]],

        [[33, 43, 53],
         [34, 44, 54],
         [35, 45, 55]]])

Update (July, 14th): How it works?

You have two tensors, x and y of shape bxn, and you want to compute:

osum[b,i,j] = x[b, i] + y[b, j]

We can, conceptually, create new variables xx and yy by repeating each element of x and y along a third dimension, such that:

xx[b, i, j] == x[b, i]  # for all j
yy[b, i, j] == y[b, j]  # for all i

With these new variables, it is easy to see that:

osum = xx + yy

since, by deinition

osum[b, i, j] == xx[b, i, j] + yy[b, i, j] == x[b, i] + y[b, j]

Now, you can use commands such as torch.expand or torch.repeat to explicitly create xx and yy - but why bother? since their elements are just trivial repetitions of the elements along specific dimensions, broadcasting does this implicitly for you.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • Thanks. I find this type of broadcasting opaque. 1. Can you explain how it works? 2. Is there a way to do something similar, so that it's explicit? Or, at the least, can I do the broadcast explicitly, so that the code makes it clear exactly what it happening? – SRobertJames Jul 13 '22 at 21:23
  • @SRobertJames please see my update – Shai Jul 14 '22 at 05:23