0

During the training process of metric learning, if the cost function is convex, then I can use the gradient descent method, and get the optimal solution.

Now, I want to train N(N may be very big, such as 100) metrics from a training set, and one method is adjust the cost function to enable these N metric to combine as a big metric matrix and use the gradient descent method, However, if the N is very big, this method is not very good, in this condition, is there some "alternative optimization" method I can use?, Is it ok that I fix 2th~Nth metrics and make gradient descent for only 1th metric and then fix 1th, 3th~Nth metrics and make gradient descent for only the 2th metric. Is there some essential condition for such "alternative optimization" method

Ali
  • 56,466
  • 29
  • 168
  • 265
  • What gradient descent method are you using? – Benjamin Gruenbaum May 18 '14 at 09:24
  • This depends heavily on the function you are optimising (how separable are the dimensions). Sometimes you have to do it all at once, sometimes one at a time is fine, sometimes one at a time or batches, with multiple passes (batches with multiple passes is the compromise I often favour) – Dave May 18 '14 at 12:49

1 Answers1

0

the AO method can't get the optimal solution , even the local optimal solution can't be. because the N convex problems can't reach the KKT conditions at the same time.