I am reviving this github issue because I believe it is valid and needs to be explained. tf.keras has a masking layer with docs that reads
For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking).
If any downstream layer does not support masking yet receives such an input mask, an exception will be raised.
# create padded zeros and change two valid entries.
inputs = np.zeros([1,5])
inputs[0,1] = 0.5
inputs[0,2] = 0.1
inputs = tf.Variable(inputs)
masked_inputs = tf.keras.layers.Masking(mask_value=0.0)(inputs)
with_masking = tf.keras.layers.Softmax()(masked_inputs)
without_masking = tf.keras.layers.Softmax()(inputs)
The two results are virtually identical
with_masking
<tf.Tensor: shape=(1, 5), dtype=float32, numpy=
array([[0.1737954 , 0.28654018, 0.19207363, 0.1737954 , 0.1737954 ]],
dtype=float32)>
without_masking
<tf.Tensor: shape=(1, 5), dtype=float64, numpy=array([[0.1737954 , 0.28654017, 0.19207362, 0.1737954 , 0.1737954 ]])>
Expected behavior
I expected to just take softmax of the valid entries, similiar to
#Assign one large value
inputs = np.zeros([1,2])
inputs[0,0] = 0.5
inputs[0,1] = 0.1
inputs = tf.Variable(inputs)
without_masking = tf.keras.layers.Softmax()(inputs)
without_masking
<tf.Tensor: shape=(1, 2), dtype=float64, numpy=array([[0.59868766, 0.40131234]])>
padded at the correct positions
with_masking
<tf.Tensor: shape=(1, 5), dtype=float32, numpy=
array([[0 , 0.59868766, 0.40131234, 0, 0 ]],
dtype=float32)>
To ignore 0's in a softmax function, we could switch out massively negative numbers?
Related: tensorflow - softmax ignore negative labels (just like caffe)
from tensorflow import __version__
__version__
'2.3.1'