For multi-class classification, we use softmax function to calculate the probability.
In the case of case = 2, we have softmax(a)_0 = e^a_0/(e^a_0 + e^a_1) = 1/(1+e^(a_1 - a_0) = sigmoid(a_0 - a_1), which we reduce softmax to logistic, and we only use 1 logit.
I'm wondering if it's possible to only use K-1 logits to model the multi-class classification problem, when we have K class?