My understanding of Softmax probability
The output of neural networks (NN) is not very discriminating. For example if I have 3 classes, for the correct class say NN output may be some value a
and for others b,c
such that a>b, a>c
. But if we do the softmax trick, after transformation firstly a+b+c = 1
which makes it interpretable as probability. Secondly, a>>>b, a>>>c
and so we are now much more confident.
So how to go further
To get the first advantage, it is sufficient to use
f(x1)/[f(x1)+f(x2)+f(x3)]
(equation 1)
for any function f(x)
Softmax chooses f(x)=exp(x)
. But as you are not comfortable with exp(x)
, you can choose say f(x)=x^2
.
I give some plots below which have profile similar to exponential and you may choose from them or use some similar function. To tackle the negative range, you may add a bias of 64 to the output.

Please note that the denominator is just a constant and need not be computed. For simplicity you can just use following instead of equation 1,
[f(x)] / [3*f(xmax)]
In your case xmax = 64 + bias(if you choose to use one)
Regards.