Why does nn.Conv1d work on 2d feature [b, c, h, w]?

Question

I am wondering why conv1d works on 2d feature(batch, channel, height, width).

An nn.Conv1d(channel, channel, kernel_size=(1,1)) works when I put 2d feature, but gives different result from nn.Conv2d(channel, channel, kernel_size=1).

I want to know why conv1d works and what it mean by 2d kernel size in 1d convolution.

I have found this topic : https://discuss.pytorch.org/t/conv1d-kernel-size-explained/84323 , which basically says that they are the same (conv1d with tuple kernel and conv2d). Could you provide a small code sample that displays the different results you have observed ? — trialNerror, Jan 19 '21 at 13:26
Here is one example: with `a = [[[[0.7891]], [[0.5954]], [[0.8586]], [[0.9551]], [[0.3609]], [[0.2691]],[[0.1870]], [[0.0516]], [[0.6104]], [[0.7839]]]]`, `conv1d = nn.Conv1d(10,5,(1,1))`, and `conv2d= nn.Conv2d(10,5,1)`, `conv1d(a)` gives `[[[[-0.2319]], [[ 0.2898]], [[-0.0504]], [[-0.3410]], [[ 0.9247]]]]` and `conv2d(a)` gives `[[[[-0.2789]], [[-0.4271]], [[-0.3730]], [[-0.8238]], [[-0.5334]]]]`, which are different. — Joanna Hong, Jan 20 '21 at 13:09

Khalid Saifullah · Accepted Answer · 2021-01-19T15:24:26.620

"I want to know why conv1d works and what it mean by 2d kernel size in 1d convolution"

It doesn't have any reason not to work. Under the hood all this "convolution" means is "Dot Product", now it could be between matrix and vector, matrix and matrix, vector and vector, etc. Simply put, the real distinction between 1D and 2D convolution is the freedom one has to move along the spatial dimension of input. This means If you look at 1D convolution, It can move along one direction only, that is, the temporal dimension of the input (Note the kernel could be a vector, matrix whatever that doesn't matter). On the other hand, 2D convolution has the freedom to move along 2 dimensions (height and width) of the input that is the spatial dimension. If it still seems confusing, have a look at the gifs below.

1D Convolution in action:

Note: It's a 1D convolution with kernel size 3x3, look how it only moves down the input which is the temporal dimension.

2D Connvolution in action:

Note: It's a 2D convolution with kernel size 3x3, look how it moves along both width and height of the input which is the spatial dimension.

I think It's clear now what is the actual difference between 1D and 2D conv and why they both would produce different results for the same input.

Yuval Sieradzki · Answer 2 · 2022-10-20T11:36:31.283

The current accepted answer is incorrect, so I write this one.

In the example the asker gives, the two convolutions are the same, up to random initialization of parameters. This is because both use the same underlying implementation, and just pass different parameters such as kernel size. nn.Conv1d, nn.Conv2d and nn.Conv3d interpret their input differently, e.g. kernel_size=3 will become (3,3) for nn.Conv2d but (3,) for nn.Conv1d.

However, you can force these parameters to be the correct shape. Note that stride and dilation need to be specified in some of the instances below:

import torch
from torch import nn

conv1d = nn.Conv1d(1, 1, 3, padding='same', bias=False)
conv2d = nn.Conv2d(1, 1, (3,), stride=(1,), dilation=(1,), padding='same', bias=False)
conv3d = nn.Conv3d(1, 1, (3,), stride=(1,), dilation=(1,), padding='same', bias=False)
conv1d.weight.data.fill_(1)
conv2d.weight.data.fill_(1)
conv3d.weight.data.fill_(1)
x = torch.rand(1, 1, 100)
assert (conv1d(x) == conv2d(x)).all() and (conv1d(x) == conv3d(x)).all()

conv1d = nn.Conv1d(1, 1, (3,3), padding='same', bias=False)
conv2d = nn.Conv2d(1, 1, 3, padding='same', bias=False)
conv3d = nn.Conv3d(1, 1, (3,3), stride=(1,1), dilation=(1,1), padding='same', bias=False)
conv1d.weight.data.fill_(1)
conv2d.weight.data.fill_(1)
conv3d.weight.data.fill_(1)
x = torch.rand(1, 1, 100, 100)
assert (conv1d(x) == conv2d(x)).all() and (conv1d(x) == conv3d(x)).all()

conv1d = nn.Conv1d(1, 1, (3,3,3), stride=(1,1,1), dilation=(1,1,1), padding='same', bias=False)
conv2d = nn.Conv2d(1, 1, (3,3,3), stride=(1,1,1), dilation=(1,1,1), padding='same', bias=False)
conv3d = nn.Conv3d(1, 1, 3, padding='same', bias=False)
conv1d.weight.data.fill_(1)
conv2d.weight.data.fill_(1)
conv3d.weight.data.fill_(1)
x = torch.rand(1, 1, 100, 100, 100)
assert (conv1d(x) == conv2d(x)).all() and (conv1d(x) == conv3d(x)).all()

This equality would not work if, as is stated in the currently accepted answer, nn.Conv1d could only "move along one direction only", as both spatial dimensions are much larger than the kernel size. nn.Conv1d could not have generated the full 100x100 output if it were locked to move only in one direction.

You can read more in https://discuss.pytorch.org/t/conv1d-kernel-size-explained/84323/4, as pointed out by @trialNerror in a comment to the question.

Why does nn.Conv1d work on 2d feature [b, c, h, w]?

2 Answers2

1D Convolution in action:

2D Connvolution in action: