I currently have a program which takes a feature vector and classification, and applies it to a known weight vector to generate a loss gradient using Logistic Regression. This is that code:
double[] grad = new double[featureSize];
//dot product w*x
double dot = 0;
for (int j = 0; j < featureSize; j++) {
dot += weights[j] * features[j];
}
//-yi exp(-yi w·xi) / (1+ exp(-yi w·xi))
double gradMultiplier = (-type) * Math.exp((-type) * dot) / (1 + (Math.exp((-type) * dot)));
//-yi xi exp(-yi w·xi) / (1+ exp(-yi w·xi))
for (int j = 0; j < featureSize; j++) {
grad[j] = features[j] * gradMultiplier;
}
return grad;
What I'm trying to do is implement something similar using a Softmax regression, but all of the info of Softmax I find online doesn't exactly follow the same vocabulary as what I know about Logit loss functions, and so I keep getting confused. How would I implement a function similar to the one above but using Softmax?
Based on the wikipedia page for Softmax, I'm under the impression that I might need multiple weight vectors, one for every possible classification. Am I wrong?