1

I have:

        print('\ninp', inp.min(), inp.mean(), inp.max())
        print(inp)
        out = self.conv1(inp)
        print('\nout1', out.min(), out.mean(), out.max())
        print(out)
        quit()

My min, mean and max for my inp is: inp tensor(9.0060e-05) tensor(0.1357) tensor(2.4454)

For my output, I have: out1 tensor(4.8751, grad_fn=<MinBackward1>) tensor(21.8416, grad_fn=<MeanBackward0>) tensor(54.9332, grad_fn=<MaxBackward1>)

My self.conv1 is:

        self.conv1 = torch.nn.Conv1d(
            in_channels=161,
            out_channels=161,
            kernel_size=11,
            stride=1,
            padding=5)
        self.conv1.weight.data = torch.zeros(self.conv1.weight.data.size())
        self.conv1.weight.data[:, :, 5] = 1.0
        self.conv1.bias.data = torch.zeros(self.conv1.bias.data.size())

So my weights look like: tensor([0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.])

So if I understand how convolution works, this should produce the same output. But it doesn't.

What am I doing wrong?

Shamoon
  • 41,293
  • 91
  • 306
  • 570

2 Answers2

3

Always try to provide a Minimal, Reproducible Example.


It shouldn't. You are probably forgetting the summation. As stated in the docs:

In the simplest case, the output value of the layer with input size (N, C_in, L) and output (N, C_out, L_out) can be precisely described as: equation

where ⋆ is the valid cross-correlation operator, N is a batch size, C denotes a number of channels, L is a length of signal sequence.

Notice that, in your example, the mean after the conv (i.e., 21.8416) is approx. 161 times the mean before (i.e., 161 * 0.1357), and this is not a coincidence. Notice the same happening in the code below:

import torch
torch.manual_seed(0)

# define the fake input data
x = torch.rand(1, 3, 5)
# >>> x
# tensor([[[0.4963, 0.7682, 0.0885, 0.1320, 0.3074],
#          [0.6341, 0.4901, 0.8964, 0.4556, 0.6323],
#          [0.3489, 0.4017, 0.0223, 0.1689, 0.2939]]])

# define the conv
conv1 = torch.nn.Conv1d(3, 3, kernel_size=5, stride=1, padding=2)
conv1.weight.data = torch.zeros(conv1.weight.data.size())
conv1.weight.data[:, :, 2] = 1.0
conv1.bias.data = torch.zeros(conv1.bias.data.size())

# print mean before
print(x.mean())
# tensor(0.4091)

# print mean after
print(conv1(x).mean())
# tensor(1.2273, grad_fn=<MeanBackward0>)

See? After the conv, the mean is 3 times the original one.

As @jodag said, if you want an identity, you can do like this:

import torch
torch.manual_seed(0)

# define the fake input data
x = torch.rand(1, 3, 5)
# >>> x
# tensor([[[0.4963, 0.7682, 0.0885, 0.1320, 0.3074],
#          [0.6341, 0.4901, 0.8964, 0.4556, 0.6323],
#          [0.3489, 0.4017, 0.0223, 0.1689, 0.2939]]])

# define the conv
conv1 = torch.nn.Conv1d(3, 3, kernel_size=5, stride=1, padding=2)
torch.nn.init.zeros_(conv1.weight)
torch.nn.init.zeros_(conv1.bias)
# set identity kernel
conv1.weight.data[:, :, 2] = torch.eye(3, 3)

# print mean before
print(x.mean())
# tensor(0.4091)

# print mean after
print(conv1(x).mean())
# tensor(0.4091, grad_fn=<MeanBackward0>)
Berriel
  • 12,659
  • 4
  • 43
  • 67
  • So there is no way for me to come up with an `identity kernel`? – Shamoon Mar 21 '20 at 01:52
  • @Shamoon You would need to have a single 1 for each output channel. For identity you would need to have all zeros except elements `for i in range(161): conv1.weight.data[i,i,5] = 1` – jodag Mar 21 '20 at 06:59
  • @Shamoon exactly. If you want to remove the for loop, you can simply do `conv1.weight.data[:, :, 5] = torch.eye(161, 161)`. I added an example for reference. – Berriel Mar 21 '20 at 12:38
  • Thanks! It's getting me close, but not quite. My inp mean is: `tensor(0.1003, device='cuda:0')` Output is `tensor(0.1054, device='cuda:0', grad_fn=)` – Shamoon Mar 21 '20 at 14:19
  • @Shamoon That is awkward. This op should be very stable numerically. If you could setup a Colab with a minimal reproducible example, I'd take a look. – Berriel Mar 21 '20 at 14:32
0

Convolution of a signal with a delta-dirac function always produces the original input. This applies to convolution of any n-dimension.

Note the delta-dirac is not the same as an "identity matrix", but it has similar behavior.

Here's an example with pytorch:

import torch as pt

num_chan_in = 3
num_chan_out = 3
image_size = 10
batch_size = 5
k = 3

# cnn with same number of input and output channels
cnn = pt.nn.Conv2d(num_chan_in, num_chan_out, kernel_size=k, padding=(k-1)//2)

assert num_chan_in == num_chan_out
with pt.no_grad():
    cnn.weight[:, :, :, :] = 0.0
    cnn.bias[:] = 0.0
    for i in range(num_chan_in):
        cnn.weight[i, i, 1, 1] = 1.0 # our equivalent delta-dirac

input = pt.randn([batch_size, num_chan_in, image_size, image_size])
output = cnn(input)

print((input == output).all())

prints "True".

Dharman
  • 30,962
  • 25
  • 85
  • 135
Jacob
  • 71
  • 7