0

I'm trying to find approach to compute the softmax probability without using exp().

assume that:

target: to compute f(x1, x2, x3) = exp(x1)/[exp(x1)+exp(x2)+exp(x3)]

conditions:

    1. -64 < x1,x2,x3 < 64

    2. result is just kept 3 desimal places.

is there any way to find a polynomial to approximately represent the result under such conditions?

Yong Wang
  • 59
  • 1
  • 5
  • You could approximate `exp` in `softmax` with piece-wise linear functions (see [here](https://vijaychan.github.io/Publications/2018_softmax.pdf)), another possibilities [are in this answer](https://stackoverflow.com/questions/6984440/approximate-ex). It depends what you're after and why you want to do that. – Szymon Maszke Jun 04 '20 at 08:47

1 Answers1

0

My understanding of Softmax probability

The output of neural networks (NN) is not very discriminating. For example if I have 3 classes, for the correct class say NN output may be some value a and for others b,c such that a>b, a>c. But if we do the softmax trick, after transformation firstly a+b+c = 1 which makes it interpretable as probability. Secondly, a>>>b, a>>>c and so we are now much more confident.

So how to go further

To get the first advantage, it is sufficient to use

f(x1)/[f(x1)+f(x2)+f(x3)]
(equation 1)

for any function f(x)

Softmax chooses f(x)=exp(x). But as you are not comfortable with exp(x), you can choose say f(x)=x^2.

I give some plots below which have profile similar to exponential and you may choose from them or use some similar function. To tackle the negative range, you may add a bias of 64 to the output.

enter image description here

Please note that the denominator is just a constant and need not be computed. For simplicity you can just use following instead of equation 1,

[f(x)] / [3*f(xmax)]

In your case xmax = 64 + bias(if you choose to use one)

Regards.

Mohit Lamba
  • 1,194
  • 13
  • 30