1

After looking at the following gist, and doing some basic tests, I am trying to create a NER system using a LSTM in keras. I am using a generator and calling fit_generator.

Here is my basic keras model:

model = Sequential([
    Embedding(input_dim=max_features, output_dim=embedding_size, input_length=maxlen, mask_zero=True),
    Bidirectional(LSTM(hidden_size, return_sequences=True)),
    TimeDistributed(Dense(out_size)),
    Activation('softmax')
])
model.compile(loss='binary_crossentropy', optimizer='adam')

My input dimension seem right:

>>> generator = generate()
>>> i,t = next(generator)
>>> print( "Inputs: {}".format(model.input_shape))
>>> print( "Outputs: {}".format(model.output_shape))
>>> print( "Actual input: {}".format(i.shape))
Inputs: (None, 3949)
Outputs: (None, 3949, 1)
Actual input: (45, 3949)

However when I call:

model.fit_generator(generator, steps_per_epoch=STEPS_PER_EPOCH, epochs=EPOCHS)

I seem to get the following error:

ValueError: 
  Error when checking target: 
    expected activation_1 to have 3 dimensions, 
    but got array with shape (45, 3949)

I have seen a few other examples of similar issues, which leads me to believe I need to Flatten() my inputs before the Activation() but if I do so I get the following error.

Layer flatten_1 does not support masking, 
but was passed an input_mask: 
    Tensor("embedding_37/NotEqual:0", shape=(?, 3949), dtype=bool)

As per previous questions, my generator is functionally equivalent to:

def generate():
    maxlen=3949
    while True:
        inputs = np.random.randint(55604, size=maxlen)
        targets = np.random.randint(2, size=maxlen)
        yield inputs, targets

I am not assuming that I need to Flatten and I am open to additional suggestions.

Nathan McCoy
  • 3,092
  • 1
  • 24
  • 46

1 Answers1

0

You either need to return only the last element of the sequence (return_sequences=False):

model = Sequential([
    Embedding(input_dim=max_features, output_dim=embedding_size, input_length=maxlen, mask_zero=True),
    Bidirectional(LSTM(hidden_size)),
    Dense(out_size),
    Activation('softmax')
])

Or remove the masking (mask_zero=False) to be able to use Flatten:

model = Sequential([
    Embedding(input_dim=max_features, output_dim=embedding_size, input_length=maxlen),
    Bidirectional(LSTM(hidden_size, return_sequences=True)),
    TimeDistributed(Dense(out_size)),
    Flatten(),
    Activation('softmax')
])

*Be careful that the output will be out_size x maxlen.

And I think you want the first option.

Edit 1: Looking at the example diagram, it makes a prediction on every timestep, so it need the softmax activation also TimeDistributed. The target dimension should be (None, maxlen, out_size):

model = Sequential([
    Embedding(input_dim=max_features, output_dim=embedding_size, input_length=maxlen, mask_zero=True),
    Bidirectional(LSTM(hidden_size, return_sequences=True)),
    TimeDistributed(Dense(out_size)),
    TimeDistributed(Activation('softmax'))
])
Julio Daniel Reyes
  • 5,489
  • 1
  • 19
  • 23
  • If I try to do the first option, and remove the `return_sequences=True` and `TimeDistributed` then I will get `Error when checking target: expected activation_1 to have shape (None, 1) but got array with shape (45, 3949) ` – Nathan McCoy Nov 02 '17 at 11:22
  • 1
    The error says that your output is `(None, 1)` but your target size is `(None, 3949)`, assuming `out_size = 1` and `maxlen = 3949`, I think you have the second option `out_size x maxlen`. Also are you sure that you want a softmax activation? Sorry I'm not familiarized with NER systems, however I can help you get your outputs right if you have a diagram of your network. – Julio Daniel Reyes Nov 02 '17 at 13:16
  • masked values are needed since the vectors passed in are `padded` as seen [in an example here](http://dirko.github.io/Bidirectional-LSTMs-with-Keras/). – Nathan McCoy Nov 02 '17 at 14:09
  • Updated my answer, be sure to check the target shape (maxlen,out_size) – Julio Daniel Reyes Nov 02 '17 at 18:24