2

This is my code:

cnn_input = Input(shape=(cnn_max_length,)) 
emb_output = Embedding(num_chars + 1, output_dim=32, input_length=cnn_max_length, trainable=True)(cnn_input)
output = TimeDistributed(Convolution1D(filters=128, kernel_size=4, activation='relu'))(emb_output)

I want to train a character-level CNN sequence labeler and I keep receiving this error:

Traceback (most recent call last):
  File "word_lstm_char_cnn.py", line 24, in <module>
    output = kl.TimeDistributed(kl.Convolution1D(filters=128, kernel_size=4, activation='relu'))(emb_output)
  File "/home/user/anaconda3/envs/thesisenv/lib/python3.6/site-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
es/keras/layers/wrappers.py", line 248, in call
    y = self.layer.call(inputs, **kwargs)
  File "/home/user/anaconda3/envs/thesisenv/lib/python3.6/site-packages/keras/layers/convolutional.py", line 160, in call
    dilation_rate=self.dilation_rate[0])
  File "/home/user/anaconda3/envs/thesisenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 3526, in conv1d
    data_format=tf_data_format)
  File "/home/user/anaconda3/envs/thesisenv/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 779, in convolution
    data_format=data_format)
  File "/home/user/anaconda3/envs/thesisenv/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 828, in __init__
    input_channels_dim = input_shape[num_spatial_dims + 1]
  File "/home/user/anaconda3/envs/thesisenv/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 615, in __getitem__
    return self._dims[key]
IndexError: list index out of range

The input is 3D, as it should be. If I change the input shape I receive this error:

ValueError: Input 0 is incompatible with layer time_distributed_1: expected ndim=3, found ndim=4
  • I think you have the same problem as this [question](https://stackoverflow.com/q/51992336/2099607). See if the answer provided resolve your issue. – today Aug 24 '18 at 16:51
  • Thank you. Actually I want to implement a character-level CNN sequence tagger. If I want to implement a word-level LSTM sequence tagger, I just pass return_sequences=True and I have all of my time-steps (words). But I don't know how to do it with char-level CNN. In other words, I want the output shape being (None, num_words, filter_size), but it is (None, 121, filter_size). –  Aug 25 '18 at 12:25
  • To be more precise: the output shape of embedding layer would be `(None, cnn_max_length, 32)`. Now what do you want to do with this output tensor? What kind of operation/layer you want to apply on it? And plus, the output of whole model would be a tag for each character in the sequence, i.e. the labels are of shape `(None, cnn_max_length, num_tags)` assuming tags are one-hot encoded? – today Aug 25 '18 at 12:35
  • Exactly. I want the conv layer to convolute only on the last dimension, treating the cnn_max_length (num_words) dimension as time step. –  Aug 25 '18 at 12:38
  • cnn_max_length = num_tags = sentence_len (padded). Dense layers are time distributed. The output of the whole model is one teg per word. –  Aug 25 '18 at 12:39
  • Then, as have been mentioned in the answer I linked to in my comment, you don't need to use `TimeDistributed` wrapper. Just apply `Convolution1D` directly on the output of embedding to get an output tensor with shape `(None, cnn_max_length, filter_size)`. Don't forget to pass `padding='same'` to conv layer so that the number of timesteps does not change in the output. – today Aug 25 '18 at 12:42
  • Problem solved. Thank you very much. –  Aug 25 '18 at 13:13

1 Answers1

3

Recommended solution: There is no need to use TimeDistributed in this case. You can fix the issue with following piece of code:

output = Convolution1D(filters=128, kernel_size=4, activation='relu')(emb_output)

Just in case, if you like to use TimeDistributed you can do something like:

output = TimeDistributed(Dense(100,activation='relu'))(emb_output)

Not recommended: According to docs:

This wrapper applies a layer to every temporal slice of an input.

The input to the TimeDistributed is something like batch_size * seq_len * emb_size. When Conv1D apply to each sequence, it needs 2 dimensions but found only one.

You can fix the problem by adding one dimension to your sequences:

TimeDistributed(Conv1D(100, 1))(keras.backend.reshape(emb, [-1, sequence_len, embeding_dim, 1]))
Amir
  • 16,067
  • 10
  • 80
  • 119
  • Thank you. Actually I want to implement a character-level CNN sequence tagger. If I want to implement a word-level LSTM sequence tagger, I just pass return_sequences=True and I have all of my time-steps (words). But I don't know how to do it with char-level CNN. In other words, I want the output shape being (None, num_words, filter_size), but it is (None, 121, filter_size). –  Aug 25 '18 at 12:25
  • @Ehsan This is another problem. Try to get the solution in another question. Just for a hint: I guess 121 is number of unique characters. – Amir Aug 25 '18 at 12:31
  • Look. This is the shape of my input data: (batch, num_words, char_embeddings). I want the conv layer to convolute only on the last dimension, treating the num_words dimension as time step. –  Aug 25 '18 at 12:35
  • @Ehsan take a look at here: https://github.com/chaitjo/character-level-cnn/blob/master/models/char_cnn_kim.py – Amir Aug 25 '18 at 12:39