Torch / Lua, which neural network structure for mini-batch training?

Question

I'm still working on implementing the mini-batch gradient update on my siamese neural network. Previously I had an implementation problem, that was correctly solved here.

Now I realized that there was also a mistake in the architecture of my neural network, that is related to my incomplete understanding of the correct implementation.

So far, I've always used a non-minibatch gradient descent approach, in which I was passing the training elements one by one to the gradient update. Now, I want to implement a gradient update through mini-batch, starting say with minibatches made of N=2 elements.

My question is: how should I change the architecture of my siamese neural network to make it able to handle a mini-batch of N=2 elements instead of a single element?

This is the (simplified) architecture of my siamese neural network:

nn.Sequential {
  [input -> (1) -> (2) -> output]
  (1): nn.ParallelTable {
    input
      |`-> (1): nn.Sequential {
      |      [input -> (1) -> (2) -> output]
      |      (1): nn.Linear(6 -> 3)
      |      (2): nn.Linear(3 -> 2)
      |    }
      |`-> (2): nn.Sequential {
      |      [input -> (1) -> (2) -> output]
      |      (1): nn.Linear(6 -> 3)
      |      (2): nn.Linear(3 -> 2)
      |    }
       ... -> output
  }
  (2): nn.CosineDistance
}

I have:

2 identical siamese neural networks (upper and lower)
6 input units
3 hidden units
2 output units
cosine distance function that compares the output of the two parallel neural networks

Here's my code:

perceptronUpper= nn.Sequential()
perceptronUpper:add(nn.Linear(input_number, hiddenUnits))
perceptronUpper:add(nn.Linear(hiddenUnits,output_number))
perceptronLower= perceptronUpper:clone('weight', 'gradWeights', 'gradBias', 
'bias')

parallel_table = nn.ParallelTable()
parallel_table:add(perceptronUpper)
parallel_table:add(perceptronLower)

perceptron = nn.Sequential()
perceptron:add(parallel_table)
perceptron:add(nn.CosineDistance())

This architecture works very well if I have a gradient update function that takes 1 element; how should modify it to let it manage a minibatch?

EDIT: I probably should use the nn.Sequencer() class, by modifying the last two lines of my code in:

perceptron:add(nn.Sequencer(parallel_table))
perceptron:add(nn.Sequencer(nn.CosineDistance())).

What do you guys think?

Alexander Lutsenko · Accepted Answer · 2016-02-29T17:51:04.773

Every nn module can work with minibatches. Some work only with minibatches, e.g. (Spatial)BatchNormalization. A module knows how many dimensions its input must contain (let's say D) and if the module receives a D+1 dimensional tensor, it assumes the first dimension to be the batch dimension. For example, take a look at nn.Linear module documentation:

The input tensor given in forward(input) must be either a vector (1D tensor) or matrix (2D tensor). If the input is a matrix, then each row is assumed to be an input sample of given batch.

function table_of_tensors_to_batch(tbl)
    local batch = torch.Tensor(#tbl, unpack(tbl[1]:size():totable()))
    for i = 1, #tbl do
       batch[i] = tbl[i] 
    end
    return batch
end

inputs = {
    torch.Tensor(5):fill(1),
    torch.Tensor(5):fill(2),
    torch.Tensor(5):fill(3),
}
input_batch = table_of_tensors_to_batch(inputs)
linear = nn.Linear(5, 2)
output_batch = linear:forward(input_batch)

print(input_batch)
 1  1  1  1  1
 2  2  2  2  2
 3  3  3  3  3
[torch.DoubleTensor of size 3x5]

print(output_batch)
 0,3128 -1,1384
 0,7382 -2,1815
 1,1637 -3,2247
[torch.DoubleTensor of size 3x2]

Ok, but what about containers (nn.Sequential, nn.Paralel, nn.ParallelTable and others)? Container itself does not deal with an input, it just sends the input (or its corresponding part) to the corresponding module it contains. ParallelTable, for example, simply applies the i-th member module to the i-th input table element. Thus, if you want it to handle a batch, each input[i] (input is a table) must be a tensor with the batch dimension as described above.

input_number = 5
output_number = 2

inputs1 = {
    torch.Tensor(5):fill(1),
    torch.Tensor(5):fill(2),
    torch.Tensor(5):fill(3),
}
inputs2 = {
    torch.Tensor(5):fill(4),
    torch.Tensor(5):fill(5),
    torch.Tensor(5):fill(6),
}
input1_batch = table_of_tensors_to_batch(inputs1)
input2_batch = table_of_tensors_to_batch(inputs2)

input_batch = {input1_batch, input2_batch}
output_batch = perceptron:forward(input_batch)

print(input_batch)
{
  1 : DoubleTensor - size: 3x5
  2 : DoubleTensor - size: 3x5
}
print(output_batch)
 0,6490
 0,9757
 0,9947
[torch.DoubleTensor of size 3]


target_batch = torch.Tensor({1, 0, 1})
criterion = nn.MSECriterion()
err = criterion:forward(output_batch, target_batch)
gradCriterion = criterion:backward(output_batch, target_batch)
perceptron:zeroGradParameters()
perceptron:backward(input_batch, gradCriterion)

Why is there nn.Sequencer then? Can one use it instead? Yes, but it's highly not recommended. Sequencer takes a sequence table and applies the module to each element in the table independently providing no speedup. Besides, it has to make copies of that module, so such "batch mode" is considerably less efficient than online (non-batch) training. Sequencer was designed to be a part of recurrent nets, no point to using it in your case.

Hi @Alexander, thanks for replying. I'm trying to implement your solution, but I get stuck in the gradient update `perceptron:backward(input_batch, targets)` instruction. `targets` should contain the targets of my training, for example `0,1`. If `input_batch` is a list of 2 DoubleTensors whose size is 3x5, what should the correct dimensions of `target` be? Thanks — DavideChicco.it, Feb 29 '16 at 16:34
@DavideChicco.it, the aim is to minimize the distance between pairs of inputs, right? What is your targets then? I would assume that it is zeros. Where does `0,1` come from? — Alexander Lutsenko, Feb 29 '16 at 16:42
I'm comparing pairs of vectors. Each vector is made of 6 real values. Each pair can be true (target=1) or false (target=0). During training, I call `perceptron:forward(input_batch)` and then `perceptron:zeroGradParameters()` and `perceptron:backward(input_batch, targets)`. I'm having troubles with the dimensions of `targets`, that I have to adapt to the new settings. A vector of #input_batch DoubleTensors of size 1 does not work, what should I use? Thanks — DavideChicco.it, Feb 29 '16 at 16:50
[torch.DoubleTensor of size BatchSize] must work. I'm using MSECriterion and it works. What's your criterion? — Alexander Lutsenko, Feb 29 '16 at 16:59
I'm using a gradient update function developed by me. Could you please take a look to my short script code? You just have to download it and run `th cosine_similarity_minibatch3_momentum_withoutSequencer.lua` The trouble is in line #104: `perceptron:backward()` Thanks! http://bit.ly/1LQ5BPf — DavideChicco.it, Feb 29 '16 at 17:15
@DavideChicco.it, I updated my answer and added an example of how to train the net using MSECriterion. Check that out. Also, siamese networks are described in the documentation: https://github.com/torch/nn/blob/master/doc/table.md#cosinedistance — Alexander Lutsenko, Feb 29 '16 at 17:50
It's been veeery laborious to implement (I had to switch from a vector of N pairs of tensors to a pair of N tensors) but in the end I think I was able to do it in the right way. Thanks Alexander! — DavideChicco.it, Feb 29 '16 at 21:58

Torch / Lua, which neural network structure for mini-batch training?

1 Answers1