Is it possible to only use K-1 logits for K-class classification?

Question

For multi-class classification, we use softmax function to calculate the probability.

In the case of case = 2, we have softmax(a)_0 = e^a_0/(e^a_0 + e^a_1) = 1/(1+e^(a_1 - a_0) = sigmoid(a_0 - a_1), which we reduce softmax to logistic, and we only use 1 logit.

I'm wondering if it's possible to only use K-1 logits to model the multi-class classification problem, when we have K class?

Not a *programming* question, hence off-topic here; please see the intro and NOTE in https://stackoverflow.com/tags/machine-learning/info — desertnaut, Sep 23 '22 at 22:10

score 1 · Accepted Answer · answered Sep 24 '22 at 10:33

The question is essentially equiavalent to asking "is there a surjective (preferably bijective) function from R^{n-1} to n-simplex" and the answer is of course positive. Some examples:

1. f([x1, ..., xn-1]) = softmax([x1, ..., xn-1, 0])
2. f([x1, ..., xn-1]) = [sigmoid(x1), (1-sigmoid(x1)) * softmax([x2, ..., xn-1])]

In general these will often introduce some arbitrary assymetry to your formulation which due to Okham's razor is something we usually avoid.

Note, that

softmax([-x, 0]) = [e^{-x}/(e^{-x} + e^0), 1/(e^{-x} + 1)] 
                 = [1-sigmoid(x), sigmoid(x)]

So in a sense solution (1) is a generalisation of what you do with sigmoid in K=2 case to the K>2 case. Unfortunately you have to arbitrary pick which of the dimensions you wil substitute with 0.

Is it possible to only use K-1 logits for K-class classification?

1 Answers1