0

I have an array with some values that are zero and some that are non-zero. Then I apply a softmax, I want all non-zero values add up to 1. But after the softmax, all values are non-zero and add up to 1.

Here's what I'm trying to do: I have some values

score[0]

<tf.Tensor: shape=(1, 48), dtype=float32, numpy=
array([[ 2.405819  , 27.748499  , 16.080362  ,  8.780167  , 16.615538  ,
        19.353844  , 19.497992  , 16.051327  ,  5.4946175 , 15.927819  ,
        11.512515  , 19.716702  , 15.100697  , 26.370419  , 21.838608  ,
        10.650975  ,  9.212484  , 17.439907  , 14.322778  , 12.001259  ,
        10.433163  , 10.011807  , 15.847178  , 18.343014  , 26.086296  ,
        26.723047  , 17.28703   , -0.7059817 , 26.380203  , 21.49808   ,
        14.828656  , 13.711437  , 19.565845  ,  5.9418716 , 12.614753  ,
        29.56828   ,  1.1372657 , 25.873251  , 36.031494  , -7.397362  ,
        12.691793  ,  4.3349338 , 15.1586275 , 14.650254  , 14.632486  ,
        18.829857  , 21.885925  ,  0.56010276]], dtype=float32)>

and a mask

mask_test[0]

<tf.Tensor: shape=(1, 48), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 1, 1, 1]])>

I multiply the values with the mask

score = tf.multiply(score, tf.cast(mask_test, tf.float32))
score[0]

<tf.Tensor: shape=(1, 48), dtype=float32, numpy=
array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        , -0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        , -0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        18.829857  , 21.885925  ,  0.56010276]], dtype=float32)>

That works fine. Now I want to add a softmax, so that all non-zero values add up to 1. The 0 should stay 0.

attention_weights = tf.nn.softmax(score, axis=-1)
attention_weights[0]

<tf.Tensor: shape=(1, 48), dtype=float32, numpy=
array([[2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
        2.9859784e-10, 4.4956207e-02, 9.5504379e-01, 5.2280064e-10]],
      dtype=float32)>

And the result are all non-zero values. I guess that is from the exponential in the softmax. Is there a way to achieve this with the softmax or is there another way? The mask is not always the same.

thanks in advance

  • 1
    If you look at the [definition of softmax](https://en.wikipedia.org/wiki/Softmax_function), the numerator is e^zi. That will not be zero unless the input is -infinity. – Nick ODell Aug 14 '21 at 20:58
  • I have updated my answer, check it out – pu239 Aug 14 '21 at 23:27
  • This is a standard-issue with using softmax for anything other than a relative estimation. In order to get a 0 as output from softmax, you will need to pass a very very small number (like the machine limit for float64), instead of 0. Check my answer for details. – Akshay Sehgal Aug 14 '21 at 23:38
  • thank you all for your comments. I'm using the custom_soft_max now and it works great! – Stickstoff Aug 15 '21 at 14:21

2 Answers2

0

Softmax does not work that way. Take a look at the formula of softmax

softmax

You would need to define a custom function for this.

A simple way of doing this would be:

def custom_soft_max(arr):
    non_zero_indices = np.where(arr != 0)
    arr[non_zero_indices] = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
    return arr

This will exclude all the indices that have a corresponding value of 0, and then perform softmax on only the non-zero indices.

pu239
  • 707
  • 7
  • 17
0

No need for a custom softmax.

Softmax() still operates on 0.0 values and returns a non-zero value as mathematically expected (link).

The only way to get a zero output from a softmax() is to pass a very small float value. If you set the masked values to the minimum possible machine limit for float64, Softmax() of this value will be zero.

To get machine limit on float64 you need tf.float64.min which is equal to -1.7976931348623157e+308. More info about machine limits on this post.

Apply this after your tf.multiply() and before using softmax to change the zeros to the min machine limit for float64, and softmax will mark them as 0 -

#Keep score where not 0, else replace by machine limit
tf.where(score!=0, score, tf.float64.min) #<----

Where, tf.float64.min gives the tf (and numpy) machine limit for float64.

Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51