Good morning everyone
Below is my implementation of a pytorch siamese network. I am using 32 batch size, MSE loss and SGD with 0.9 momentum as optimizer.
class SiameseCNN(nn.Module):
def __init__(self):
super(SiameseCNN, self).__init__() # 1, 40, 50
self.convnet = nn.Sequential(nn.Conv2d(1, 8, 7), nn.ReLU(), # 8, 34, 44
nn.Conv2d(8, 16, 5), nn.ReLU(), # 16, 30, 40
nn.MaxPool2d(2, 2), # 16, 15, 20
nn.Conv2d(16, 32, 3, padding=1), nn.ReLU(), # 32, 15, 20
nn.Conv2d(32, 64, 3, padding=1), nn.ReLU()) # 64, 15, 20
self.linear1 = nn.Sequential(nn.Linear(64 * 15 * 20, 100), nn.ReLU())
self.linear2 = nn.Sequential(nn.Linear(100, 2), nn.ReLU())
def forward(self, data):
res = []
for j in range(2):
x = self.convnet(data[:, j, :, :])
x = x.view(-1, 64 * 15 * 20)
res.append(self.linear1(x))
fres = abs(res[1] - res[0])
return self.linear2(fres)
Each batch contains alternating pairs, i.e [pos, pos], [pos, neg], [pos, pos]
etc... However, the network doesn't converge, and the problem seems that fres
in the network is the same for each pair (regardless of whether it is a positive or negative pair), and the output of self.linear2(fres)
is always approximately equal to [0.0531, 0.0770]
. This is in contrast with what I am expecting, which is that the first value of [0.0531, 0.0770]
would get closer to 1 for a positive pair as the network learns, and the second value would get closer to 1 for a negative pair. These two values also need to sum up to 1.
I have tested exactly the same setup and same input images for a 2 channel network architecture, where, instead of feeding in [pos, pos]
you would stack those 2 images in a depth-wise fashion, for example numpy.stack([pos, pos], -1)
. The dimension of nn.Conv2d(1, 8, 7)
also changes to nn.Conv2d(2, 8, 7)
in this setup. This works perfectly fine.
I have also tested exactly the same setup and input images for a traditional CNN approach, where I just pass in single positive and negative grey scale images into the network, instead of stacking them (as with the 2-CH approach) or passing them in as image pairs (as with the Siamese approach). This also works perfectly, but the results are not so good as with the 2 channel approach.
EDIT (Solutions I've tried):
- I have tried a number of different loss functions, including HingeEmbeddingLoss and CrossEntropyLoss, all resulting in more or less the same problem. So I think it is safe to say that the problem is not caused by the employed loss function; MSELoss.
- Different batch sizes also seem to have no effect on the issue.
- I tried increasing the number of trainable parameters as suggested in Keras Model for Siamese Network not Learning and always predicting the same ouput Also doesn't work.
- Tried to change the network architecture as implemented here: https://github.com/benmyara/pytorch-examples/blob/master/notebooks/1_NeuralNetworks/9_siamese_nn.ipynb. In other words, changed the forward pass to the following code. Also changed the loss to CrossEntropy, and the optimizer to Adam. Still no luck:
def forward(self, data):
res = []
for j in range(2):
x = self.convnet(data[:, j, :, :])
x = x.view(-1, 64 * 15 * 20)
res.append(x)
fres = self.linear2(self.linear1(abs(res[1] - res[0]))))
return fres
- I also tried to change the whole network from a CNN to a linear network as implemented here: https://github.com/benmyara/pytorch-examples/blob/master/notebooks/1_NeuralNetworks/9_siamese_nn.ipynb. Still doesn't work.
- Tried to use a lot more data as suggested here: Keras Model for Siamese Network not Learning and always predicting the same ouput. No luck...
- Tried to use
torch.nn.PairwiseDistance
between the outputs ofconvnet
. Made some sort of improvement; the network starts to converge for the first few epochs, and then hits the same plateau everytime:
def forward(self, data):
res = []
for j in range(2):
x = self.convnet(data[:, j, :, :])
res.append(x)
pdist = nn.PairwiseDistance(p=2)
diff = pdist(res[1], res[0])
diff = diff.view(-1, 64 * 15 * 10)
fres = self.linear2(self.linear1(diff))
return fres
Another thing to note perhaps is that, within the context of my research, a Siamese network is trained for each object. So the first class is associated with the images containing the object in question, and the second class is associated with images containing other objects. Don't know if this might be the cause of the problem. It is however not a problem within the context of the Traditional CNN and 2-Channel CNN approaches.
As per request, here is my training code:
model = SiameseCNN().cuda()
ls_fn = torch.nn.BCELoss()
optim = torch.optim.SGD(model.parameters(), lr=1e-6, momentum=0.9)
epochs = np.arange(100)
eloss = []
for epoch in epochs:
model.train()
train_loss = []
for x_batch, y_batch in dp.train_set:
x_var, y_var = Variable(x_batch.cuda()), Variable(y_batch.cuda())
y_pred = model(x_var)
loss = ls_fn(y_pred, y_var)
train_loss.append(abs(loss.item()))
optim.zero_grad()
loss.backward()
optim.step()
eloss.append(np.mean(train_loss))
print(epoch, np.mean(train_loss))
Note dp
in dp.train_set
is a class with attributes train_set, valid_set, test_set
, where each set is created as follows:
DataLoader(TensorDataset(torch.Tensor(x), torch.Tensor(y)), batch_size=bs)
As per request, here is an example of the predicted probabilities vs true label, where you can see the model doesn't seem to be learning:
Predicted: 0.5030623078346252 Label: 1.0
Predicted: 0.5030624270439148 Label: 0.0
Predicted: 0.5030624270439148 Label: 1.0
Predicted: 0.5030625462532043 Label: 0.0
Predicted: 0.5030625462532043 Label: 1.0
Predicted: 0.5030626654624939 Label: 0.0
Predicted: 0.5030626058578491 Label: 1.0
Predicted: 0.5030627250671387 Label: 0.0
Predicted: 0.5030626654624939 Label: 1.0
Predicted: 0.5030627846717834 Label: 0.0
Predicted: 0.5030627250671387 Label: 1.0
Predicted: 0.5030627846717834 Label: 0.0
Predicted: 0.5030627250671387 Label: 1.0
Predicted: 0.5030628442764282 Label: 0.0
Predicted: 0.5030627846717834 Label: 1.0
Predicted: 0.5030628442764282 Label: 0.0