0

I build a toy CNN model to fit a pair of random tensors(input_tensor & truth).

batch_size = 1
channel = 3
input_size = 128
input_tensor = torch.rand((batch_size, channel, input_size, input_size))
truth = torch.rand((batch_size, channel, input_size, input_size))
device = torch.device("cuda")


class ConvModel(nn.Module):
    def __init__(self):
        super(ConvModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 57344, (3, 3), (1, 1), padding=1)
        self.conv2 = nn.Conv2d(57344, 3, (3, 3), (1, 1), padding=1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, input_):
        x = self.conv1(input_)
        x = self.relu(x)

        x = self.conv2(x)
        x = self.sigmoid(x)
        return x

model = ConvModel().to(device)
loss_func = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-5)

for epoch in range(100):
    output = model(input_tensor.to(device))
    loss = loss_func(output, truth.to(device))

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    if (1 + epoch) % 10 == 0:
        print(loss.detach().item())

I used the above codes to generate input&output pair and trained the model, and I got loss values as follow:

    0.08877705037593842
    0.08524381369352341
    0.08396070450544357
    0.0834180936217308
    0.08318136632442474
    0.08298520743846893
    0.08282201737165451
    0.08265350759029388
    0.08248833566904068
    0.08231770992279053

I'm confused that my model almost cannot fit ONE pair of data in 100 EPOCHS. Is there any problem?

Thanks for any feedback.

ojipadeson
  • 129
  • 1
  • 9
  • What is confusing you, exactly? If we are only going to look at the loss values, a well-fitted model should have low loss values. – Iran Ribeiro Jul 06 '22 at 10:52

1 Answers1

1

Note that the convolution kernel is shared spatially. You network is just like trying to map a random 7*7 matrix to a random value (7 is the size of the receptive field of the output layer), and you have 128*128 this kind of pairs (despite you have only one pair of tensor). So you network failed to overfit your dataset. Reducing the input_size may help you reduce the loss.

hellohawaii
  • 3,074
  • 6
  • 21
  • Could I build a deeper network to increase receptive field to 256*256? – ojipadeson Jul 06 '22 at 14:14
  • @ojipadeson I believe that with a deeper network, you can manage to overfit ONE pair of traning data. You want to get an output of the same size as the input tensor, this is similar to image to the image translation task and the image segmentation task. You may use the network structure designed for these tasks as reference, for example, [UNet](https://amaarora.github.io/2020/09/13/unet.html). However, I think is is impossible to fit a large number of this kind of random pairs, see the [Universal approximation theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem) – hellohawaii Jul 06 '22 at 14:39