I have a simple and working multi-layer perceptron in Theano, with 1 hidden layer and 1 regression layer with 2 outputs. In the regression layer a mean square error function is defined which is used as cost function. However, during learning I now want to minimize the cosine distance between two vectors, so I wanted to use the cosine distance as cost function instead. Below some relevant parts of my current implementation.
import theano
import theano.tensor as T
class RegressionLayer(object):
def __init__(self, input, n_in, n_out, W=None, b=None):
# rest of __init__ left out for brevity
def mse(self, y):
return T.mean(T.sqr(y - self.y_pred))
def cos(self, y):
return 1. - (T.dot(y,self.y_pred) / (T.sqrt(T.sum(T.sqr(y)) * T.sum(T.sqr(self.y_pred)))))
If I change the cost function from mse(y)
to cos(y)
I get the following error:
TypeError: cost must be a scalar.
I don't see why the cost (function) would not be scalar. Just for testing I tried:
def cos(self, y):
T.sum(1. - (T.dot(y,self.y_pred) / (T.sqrt(T.sum(T.sqr(y)) * T.sum(T.sqr(self.y_pred))))))
The model builds then, but I get a dimension mismatch during training.
ValueError: dimension mismatch in args to gemm (1,2)x(1,2)->(1,2)
I think the problem is that I don't see how my cosine distance function is different from my mean square error function in Theano. What do I miss here?