0

Suppose I have an input matrix of shape (batch_size ,channels ,h ,w)

in this case (1 ,2 ,3 ,3)

[[[[ 0.,  1.,  2.],
   [ 3.,  4.,  5.],
   [ 6.,  7.,  8.]],

  [[ 9., 10., 11.],
   [12., 13., 14.],
   [15., 16., 17.]]]])

to do a convolution with it i unroll it to the shape of (batch_size ,channels * kernel_size * kernel_size ,out_h * out_w) which is:

[[[ 0.,  1.,  3.,  4.],
  [ 1.,  2.,  4.,  5.],
  [ 3.,  4.,  6.,  7.],
  [ 4.,  5.,  7.,  8.],
  [ 9., 10., 12., 13.],
  [10., 11., 13., 14.],
  [12., 13., 15., 16.],
  [13., 14., 16., 17.]]]

now i want to get the unrolled matrix back to its original form which looks like this:

# for demonstration only the first and second column of the unrolled matrix
# the output should be the same shape as the initial matrix -> initialized to zeros
# current column -> [ 0.,  1.,  3.,  4.,  9., 10., 12., 13.]

[[[[0+0, 0+1, 0],
   [0+3, 0+4, 0],
   [0  , 0  , 0]],

  [[0+9 , 0+10, 0],
   [0+12, 0+13, 0],
   [0   , 0   , 0]]]]

# for the next column it would be
# current column -> [ 1.,  2.,  4.,  5., 10., 11., 13., 14.]

[[[[0 , 1+1, 0+2],
   [3 , 4+4, 0+5],
   [0 , 0  , 0  ]],

  [[9  , 10+10, 0+11],
   [12 , 13+13, 0+14],
   [0  , 0    , 0   ]]]])

you basically put your unrolled elements back to its original place and sum the overlapping parts together.

But now to my question:

How could one implement this as fast as possible using numpy and as less loops as possible. I already just looped through it kernel by kernel but this aproach isnt feasible with larger inputs. I think this could be parallelized quite a bit but my numpy indexing and overall knowledge isnt good enough to figure out a good solution by myself.

thanks for reading and have a nice day :)

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user15770670
  • 105
  • 6
  • `to do a convolution with it i unroll it to the shape of (batch_size ,channels * kernel_size * kernel_size ,out_h * out_w)`: How did you do this? May be a bit of your code can help us suggest the reverse of this operation. – swag2198 Jun 21 '21 at 06:25
  • @swag2198 heres a litle article about it (only the first part) https://www.telesens.co/2018/04/09/initializing-weights-for-the-convolutional-and-fully-connected-layers/ – user15770670 Jun 21 '21 at 11:55

1 Answers1

0

With numpy, I expect this can be done using numpy.lib.stride_tricks.as_strided. However, I'd suggest that you look at pytorch, which interoperates easily with numpy and has quite efficient primitives for this operation. In your case, the code would look like:

kernel_size = 2
x = torch.arange(18).reshape(1, 2, 3, 3).to(torch.float32)
unfold = torch.nn.Unfold(kernel_size=kernel_size)
fold = torch.nn.Fold(kernel_size=kernel_size, output_size=(3, 3))
unfolded = unfold(x)
cols = torch.arange(kernel_size ** 2)
for col in range(kernel_size ** 2):
    # col = 0
    unfolded_masked = torch.where(col == cols, unfolded, torch.tensor(0.0, dtype=torch.float32))
    refolded = fold(unfolded_masked)
    print(refolded)
tensor([[[[ 0.,  1.,  0.],
          [ 3.,  4.,  0.],
          [ 0.,  0.,  0.]],
         [[ 9., 10.,  0.],
          [12., 13.,  0.],
          [ 0.,  0.,  0.]]]])
tensor([[[[ 0.,  1.,  2.],
          [ 0.,  4.,  5.],
          [ 0.,  0.,  0.]],
         [[ 0., 10., 11.],
          [ 0., 13., 14.],
          [ 0.,  0.,  0.]]]])
tensor([[[[ 0.,  0.,  0.],
          [ 3.,  4.,  0.],
          [ 6.,  7.,  0.]],
         [[ 0.,  0.,  0.],
          [12., 13.,  0.],
          [15., 16.,  0.]]]])
tensor([[[[ 0.,  0.,  0.],
          [ 0.,  4.,  5.],
          [ 0.,  7.,  8.]],
         [[ 0.,  0.,  0.],
          [ 0., 13., 14.],
          [ 0., 16., 17.]]]])
user39430
  • 150
  • 5