my code looks like.
output = lasagne.layers.get_output(output_layer)
loss = function(output) * target
loss = -(loss.sum())
params = lasagne.layers.get_all_params(output_layer)
updates = lasagne.updates.sgd(loss,params,learning_rate=0.00001)
train_fn = theano.function([input,target], loss, updates=updates,allow_input_downcast=True)
validate_fn = theano.function([input,target], loss, allow_input_downcast=True)
here outputlayer is a CNN network, and function is defined as follows:
def function(X):
squared_euclidean_distances = (X ** 2).sum(1).reshape((X.shape[0], 1)) + (X ** 2).sum(1).reshape((1, X.shape[0])) - 2 * X.dot(X.T)
dist = 1/(1+squared_euclidean_distances)
Pij = (dist) / (dist.sum(0))
return Pij
target is a sparse matrix where target(i,j) = 1 if outputlayer(i) and outputlayer(j) belong to the same class, otherwise target(i,j) = 0
When digging the code, I found that the error is from a conv layer in the CNN network, raised by a function true_div.
Clearly, the only difference of the train_fn and validate_fn is the updates parameter. However, I print the output of train_fn and validate_fn with the same dummy input. The output of validate_fn makes sense, but train_fn output is NAN. I think the ouput is produced before updating updates the parameters. Anything wrong?