0

I have a graph as follows, where the input x has two paths to reach y. They are combined with a gModule that uses cMulTable. Now if I do gModule:backward(x,y), I get a table of two values. Do they correspond to the error derivative derived from the two paths? enter image description here

But since path2 contains other nn layers, I suppose I need to derive the derivates in this path in a stepwise fashion. But why did I get a table of two values for dy/dx?

To make things clearer, code to test this is as follows:

input1 = nn.Identity()()
input2 = nn.Identity()()
score = nn.CAddTable()({nn.Linear(3, 5)(input1),nn.Linear(3, 5)(input2)})
g = nn.gModule({input1, input2}, {score})  #gModule

mlp = nn.Linear(3,3) #path2 layer

x = torch.rand(3,3)
x_p = mlp:forward(x)
result = g:forward({x,x_p})
error = torch.rand(result:size())
gradient1 = g:backward(x, error)  #this is a table of 2 tensors
gradient2 = g:backward(x_p, error)  #this is also  a table of 2 tensors

So what is wrong with my steps?

P.S, perhaps I have found out the reason because g:backward({x,x_p}, error) results in the same table. So I guess the two values stand for dy/dx and dy/dx_p respectively.

Amir
  • 10,600
  • 9
  • 48
  • 75
Jack Cheng
  • 121
  • 1
  • 11

1 Answers1

1

I think you simply made a mistake constructing your gModule. gradInput of every nn.Module has to have exactly the same structure as its input - that is the way backprop works.

Here's an example how to create a module like yours using nngraph:

require 'torch'
require 'nn'
require 'nngraph'

function CreateModule(input_size)
    local input = nn.Identity()()   -- network input

    local nn_module_1 = nn.Linear(input_size, 100)(input)
    local nn_module_2 = nn.Linear(100, input_size)(nn_module_1)

    local output = nn.CMulTable()({input, nn_module_2})

    -- pack a graph into a convenient module with standard API (:forward(), :backward())
    return nn.gModule({input}, {output})
end


input = torch.rand(30)

my_module = CreateModule(input:size(1))

output = my_module:forward(input)
criterion_err = torch.rand(output:size())

gradInput = my_module:backward(input, criterion_err)
print(gradInput)

UPDATE

As I said, gradInput of every nn.Module has to have exactly the same structure as its input. So, if you define your module as nn.gModule({input1, input2}, {score}), your gradOutput (the result of the backward pass) will be a table of gradients w.r.t. input1 and input2 which in your case are x and x_p.

The only question remains: why on Earth don't you get an error when call:

gradient1 = g:backward(x, error) 
gradient2 = g:backward(x_p, error)

An exception must be raised because the first argument must be not a tensor but a table of two tensors. Well, most (perhaps all) of torch modules during calculating :backward(input, gradOutput) don't use input argument (they usually store a copy of input from the last :forward(input) call). In fact, this argument is so useless that modules don't even bother themselves to verify it.

Alexander Lutsenko
  • 2,130
  • 8
  • 14
  • Hi Alex, thanks for your answer. Instead of using a single input x, I have created the gModule with two inputs a and b while the value of b depends on a. I did it in this way because the nn layer is more complicated than a linear transformation. It has an LSTM structure. – Jack Cheng Nov 06 '15 at 13:46
  • 1
    I have also included my code simulation, please check it out @Alexander Lutsenko – Jack Cheng Nov 06 '15 at 14:25