1

I have a simple and working multi-layer perceptron in Theano, with 1 hidden layer and 1 regression layer with 2 outputs. In the regression layer a mean square error function is defined which is used as cost function. However, during learning I now want to minimize the cosine distance between two vectors, so I wanted to use the cosine distance as cost function instead. Below some relevant parts of my current implementation.

import theano
import theano.tensor as T

class RegressionLayer(object):
    def __init__(self, input, n_in, n_out, W=None, b=None):
        # rest of __init__ left out for brevity

    def mse(self, y):
        return T.mean(T.sqr(y - self.y_pred))

    def cos(self, y):
        return 1. - (T.dot(y,self.y_pred) / (T.sqrt(T.sum(T.sqr(y)) * T.sum(T.sqr(self.y_pred)))))

If I change the cost function from mse(y) to cos(y) I get the following error:

TypeError: cost must be a scalar.

I don't see why the cost (function) would not be scalar. Just for testing I tried:

def cos(self, y):
    T.sum(1. - (T.dot(y,self.y_pred) / (T.sqrt(T.sum(T.sqr(y)) * T.sum(T.sqr(self.y_pred))))))

The model builds then, but I get a dimension mismatch during training.

ValueError: dimension mismatch in args to gemm (1,2)x(1,2)->(1,2)

I think the problem is that I don't see how my cosine distance function is different from my mean square error function in Theano. What do I miss here?

Semi
  • 448
  • 1
  • 5
  • 16

1 Answers1

3

The difference is that in your mse function computes the T.mean without specifying an axis so gives the mean over all entries in the tensor, whatever shape that tensor might be. In comparison your first cos function does not aggregate at all so the return value will have the same shape as T.dot(y,self.y_pred), i.e. not a scalar. You sum in the second version of the cos function which produces the required scalar but this may not be computing what you want it to compute, depending on the semantics of the shape of your inputs.

The second error is probably due to the error in your cos function: you don't want to perform a dot product in numerator, i.e. T.dot(y,self.y_pred), instead you want the element-wise multiplication, e.g. y * self.y_pred

Here's my code for doing various distances in Theano. Note that the _magnitude and cosine functions include tweaks that help avoid NaN or out of range values. These kinds of problems can occur in either the forward pass or backward pass (gradient).

import numpy
import theano.tensor as tt


def _squared_magnitude(x):
    return tt.sqr(x).sum(axis=-1)


def _magnitude(x):
    return tt.sqrt(tt.maximum(_squared_magnitude(x), numpy.finfo(x.dtype).tiny))


def cosine(x, y):
    return tt.clip((1 - (x * y).sum(axis=-1) / (_magnitude(x) * _magnitude(y))) / 2, 0, 1)


def euclidean(x, y):
    return _magnitude(x - y)


def squared_euclidean(x, y):
    return _squared_magnitude(x - y)
Daniel Renshaw
  • 33,729
  • 8
  • 75
  • 94
  • Thanks! I think my mistake came from misunderstanding T.dot(), apparently T.dot(x, y) is different from T.sum(x * y). – Semi Nov 04 '15 at 14:43