I have just started using CRF layer provided in keras-contrib library for NER (named entity recognition) task.
The problem I've faced was that while training the model with default parameters, loss is becoming nan
value in the end of 1st epoch, and never changes.
The thing helped me was changing learn_mode
parameter of CRF layer to 'marginal'
.
Could anyone please explain the difference between 'join'
and 'marginal'
learn_mode? Why in my case (NER problem) 'join'
mode leads to nan value? Why 'marginal'
works?
# input and embedding for words
word_in = Input(shape=(max_len_doc,))
emb_word = Embedding(input_dim=n_words + 2, output_dim=50,
input_length=max_len_doc, mask_zero=True)(word_in)
# input and embeddings for characters
char_in = Input(shape=(max_len_doc, max_len_word,))
emb_char = TimeDistributed(Embedding(input_dim=n_chars + 2, output_dim=10,
input_length=max_len_word, mask_zero=True))(char_in)
# character LSTM to get word encodings by characters
char_enc = TimeDistributed(LSTM(units=50, return_sequences=False,
recurrent_dropout=0.5))(emb_char)
# main LSTM
model_crf = concatenate([emb_word, char_enc])
model_crf = SpatialDropout1D(0.3)(model_crf)
model_crf = Bidirectional(LSTM(units=128, return_sequences=True, recurrent_dropout=0.6))(model_crf)
model_crf = Bidirectional(LSTM(units=128, return_sequences=True, recurrent_dropout=0.3))(model_crf)
model_crf = TimeDistributed(Dense(n_tags, activation="relu"))(model_crf)
crf = CRF(n_tags) # crf = CRF(n_tags, learn_mode='marginal')
out = crf(model_crf)