1

I am on tensorflow v2 on google colab using tf.keras. I am trying to use embedding with masking follow by global average. Here's my code:

vocab_size = 1500

inputs = Input(shape=(None,), dtype=tf.int32, name='word_sequence')

x = Embedding(input_dim=vocab_size, output_dim=16, mask_zero=True)(inputs)

outputs = tf.keras.layers.GlobalAveragePooling1D()(x)

model = Model(inputs, outputs)

But I got this error:

TypeError: Failed to convert object of type to Tensor. Contents: [-1, None, 1]. Consider casting elements to a supported type.

If I provide an explicit length of sequence Input(shape=(10,), .....), then it seems to have no error (although I havent tested it with sample data). I wonder why you would need to specify an explicit sequence length, I thought this could be done lazily at runtime when the layer first encounters the data.

Furthermore, the following works (taken from "masking and padding" tf tutorial):

inputs = tf.keras.Input(shape=(None,), dtype='int32')
x = layers.Embedding(input_dim=5000, output_dim=16, mask_zero=True)(inputs)
outputs = layers.LSTM(32)(x)

model = tf.keras.Model(inputs, outputs)

For LSTM, it seems like it is happy with input shape of None during functional api construction of model.

Could someone pls explain how is this bad with GlobalAveragePooling1D, or that this should work but I did something wrong?

Thanks.

kawingkelvin
  • 3,649
  • 2
  • 30
  • 50
  • In theory, you don't need to specific an array size in order to compute mean. E.g np.mean(...) doesn't need explicit input length. So naively, I do think this is a bug, assuming I did everything right in terms of API calling. – kawingkelvin Oct 21 '19 at 20:20

2 Answers2

0

I don't have the reputation to add a comment, so here's what I wanted to say: I seem to have the same problem, both with GRU and LSTM. The problem seems to vanish when I use GlobalMaxPooling1D instead. I feel it is a problem caused by the underlying implementation of Masking, but I don't know anything about the low-level Keras API to comment on that.

xhlulu
  • 11
  • 1
  • Maybe you can share your use case for GRU/LSTM. I thought they should work. But I only tried the simple code used in that tutorial (with code snippet i pasted in my Q as well). – kawingkelvin Nov 01 '19 at 17:08
0

This is because of the implementation of GlobalAveragePooling1D when input_mask is not None needs to specify the timestep dimension. So if you try to remove mask_zero = True in the Embedding layer, you can build the model successfully.

Looking into the source code of GlobalAveragePooling1D, we can see that:

  def call(self, inputs, mask=None):
    steps_axis = 1 if self.data_format == 'channels_last' else 2
    if mask is not None:
      mask = math_ops.cast(mask, backend.floatx())
      input_shape = inputs.shape.as_list()
      broadcast_shape = [-1, input_shape[steps_axis], 1]
      mask = array_ops.reshape(mask, broadcast_shape)
      inputs *= mask
      return backend.sum(inputs, axis=steps_axis) / math_ops.reduce_sum(
          mask, axis=steps_axis)
    else:
      return backend.mean(inputs, axis=steps_axis)

so if mask is not None (in your example, the mask is the mask generated by the Embedding layer because you set mask_zero=True), the broadcast_shape would be [-1, None, 1], and the None cause error in reshape(mask, broadcast_shape). So I think the only solution is to specify the timestep (sequence length) as input shape.

MachineLearner
  • 413
  • 5
  • 10