I'm still working on implementing the mini-batch gradient update on my siamese neural network. Previously I had an implementation problem, that was correctly solved here.
Now I realized that there was also a mistake in the architecture of my neural network, that is related to my incomplete understanding of the correct implementation.
So far, I've always used a non-minibatch gradient descent approach, in which I was passing the training elements one by one to the gradient update. Now, I want to implement a gradient update through mini-batch, starting say with minibatches made of N=2 elements.
My question is: how should I change the architecture of my siamese neural network to make it able to handle a mini-batch of N=2 elements instead of a single element?
This is the (simplified) architecture of my siamese neural network:
nn.Sequential {
[input -> (1) -> (2) -> output]
(1): nn.ParallelTable {
input
|`-> (1): nn.Sequential {
| [input -> (1) -> (2) -> output]
| (1): nn.Linear(6 -> 3)
| (2): nn.Linear(3 -> 2)
| }
|`-> (2): nn.Sequential {
| [input -> (1) -> (2) -> output]
| (1): nn.Linear(6 -> 3)
| (2): nn.Linear(3 -> 2)
| }
... -> output
}
(2): nn.CosineDistance
}
I have:
- 2 identical siamese neural networks (upper and lower)
- 6 input units
- 3 hidden units
- 2 output units
- cosine distance function that compares the output of the two parallel neural networks
Here's my code:
perceptronUpper= nn.Sequential()
perceptronUpper:add(nn.Linear(input_number, hiddenUnits))
perceptronUpper:add(nn.Linear(hiddenUnits,output_number))
perceptronLower= perceptronUpper:clone('weight', 'gradWeights', 'gradBias',
'bias')
parallel_table = nn.ParallelTable()
parallel_table:add(perceptronUpper)
parallel_table:add(perceptronLower)
perceptron = nn.Sequential()
perceptron:add(parallel_table)
perceptron:add(nn.CosineDistance())
This architecture works very well if I have a gradient update function that takes 1 element; how should modify it to let it manage a minibatch?
EDIT: I probably should use the nn.Sequencer() class, by modifying the last two lines of my code in:
perceptron:add(nn.Sequencer(parallel_table))
perceptron:add(nn.Sequencer(nn.CosineDistance())).
What do you guys think?