Translating Logistic Regression loss function to Softmax

Question

I currently have a program which takes a feature vector and classification, and applies it to a known weight vector to generate a loss gradient using Logistic Regression. This is that code:

double[] grad = new double[featureSize];

        //dot product w*x
        double dot = 0;
        for (int j = 0; j < featureSize; j++) {
            dot += weights[j] * features[j];
        }

        //-yi exp(-yi w·xi) / (1+ exp(-yi w·xi))
        double gradMultiplier = (-type) * Math.exp((-type) * dot) / (1 + (Math.exp((-type) * dot)));

        //-yi xi exp(-yi w·xi) / (1+ exp(-yi w·xi))
        for (int j = 0; j < featureSize; j++) {
            grad[j] = features[j] * gradMultiplier;
        }

        return grad;

What I'm trying to do is implement something similar using a Softmax regression, but all of the info of Softmax I find online doesn't exactly follow the same vocabulary as what I know about Logit loss functions, and so I keep getting confused. How would I implement a function similar to the one above but using Softmax?

Based on the wikipedia page for Softmax, I'm under the impression that I might need multiple weight vectors, one for every possible classification. Am I wrong?

You might want to consider moving this to the [math website](http://math.stackexchange.com). — 0x6C38, May 22 '16 at 23:16

joel314 · Answer 1 · 2016-05-23T06:04:08.273

The Softmax regression is a generalization of the Logistic regression. In Logistic regression, the labels are binary and in Softmax regression, they can take more than two values. Logistic regression refers to binomial logistic regression and Softmax regression refers to multinomial logistic regression.

There is an excellent page about it here. In you code, you seem to try to implement gradient descent to calculate the weights minimizing the cost function. This topic is covered by the provided link.

Based on the wikipedia page for Softmax, I'm under the impression that I might need multiple weight vectors, one for every possible classification. Am I wrong?

You are right. If you have n features and K classes, then your weights are K vectors of n elements as indicated on the link above.

Let me know if it helps.

Translating Logistic Regression loss function to Softmax

1 Answers1