During the training process of metric learning, if the cost function is convex, then I can use the gradient descent method, and get the optimal solution.
Now, I want to train N(N may be very big, such as 100) metrics from a training set, and one method is adjust the cost function to enable these N metric to combine as a big metric matrix and use the gradient descent method, However, if the N is very big, this method is not very good, in this condition, is there some "alternative optimization" method I can use?, Is it ok that I fix 2th~Nth metrics and make gradient descent for only 1th metric and then fix 1th, 3th~Nth metrics and make gradient descent for only the 2th metric. Is there some essential condition for such "alternative optimization" method