Gradient Descent huge loss function

Question

I'm facing a non-binary classification problem of this form:

Input: 2-dimensional vector (x,y) with -1 < x < 1, -1 < y < 1.
Output: 4-dimensional vector (p_0, p_1, p_2, p_3), where 0 < p_i < 1, and sum(p_i) <= 1 (of course, i = 0,...,3).

The program I use to classify them wants to simulate a 4-qubit quantum circuit, that meaning that I start with a 16-dimensional vector with a 1 as a first entry, and 0s elsewhere, and then I apply a series of rotations in the shape of matrix products.

Rephrasing it a bit: I start with said 16-dimensional vector, and then I multiply it by a 16x16 matrix which depends on the point's first component "x", which renders a new 16-dimensional vector. Next, I multiply this new vector by a different matrix, now having "y" as a parameter. I call this process the "encoding" of the data.

Once encoded, I use a set of matrices, depending on a different parameter each. A smart choice of these parameters is what will bring me the desired classification.

So, after every product is calculated, I end up with a new 16-dimensional vector, which depends on every mentioned parameter, and we will call a(x,y).

From here I design a target function f(x,y) = (p_0, p_1, p_2, p_3). Each of the p_i's will be a sum of some of a(x,y)'s components.

Now, f(x,y) is the actual output I obtain for input (x,y). Let me call d(x,y) the desired output. My goal is to find a set of parameter values that makes f(x,y) be as close as possible to d(x,y) for a somewhat large amount of input data.

d(x,y) can take only one of four possible values:

(1,0,0,0) -dubbed as "0",
(0,1,0,0) -dubbed as "1",
(0,0,1,0) -dubbed as "2",
(0,0,0,1) -dubbed as "3".

The cost function I chose for this affair is a quadratic cost function. In order to minimize the cost function, I use a Gradient Descent algorithm. I compute the partial derivatives with a centered finite differences method.

So, now that the program is described, my real problem: With this configuration, I obtain pretty high Cost (loss) values, ranging from 1.5 to about 4.

In order to achieve these results, I run the Gradient Descent program for 30 times (epochs), with a learning rate of 1.

I'm used to having really small loss values (0.25 used to be a very bad result for a very similar problem), but I do still not have a good enough grip of what is actually going on behind the numbers as to know whether I should be too worried about this or not.

My program achieves a ~40% accuracy at its finest (trying several different sets of matrices) for a 1000 training points and a 1000 evaluating points.

I assume that a high loss value might mean that my program is just not good enough to perform this classification, but I do not know to what extend shall I be able to obtain better results.

Is there anything I'm doing utterly wrong, or is it just that this structure is not good enough for classifying?

Thank you very much for any feedback in advance.

You have to adjust things like the learning rate, you are using a learning rate of 1, which is very high, try with a smaller value like 0.01. — Dr. Snoopy, Dec 07 '18 at 12:39
Thanks! I did try it, but I did not achieve significant improvements. I'll keep trying :) — Elies G., Dec 10 '18 at 08:37

Gradient Descent huge loss function

0 Answers0