5

I'm going crazy in this project. This is multi-label text-classification with lstm in keras. My model is this:

model = Sequential()

model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, mask_zero=True, weights=[embedding_weights] ))
model.add(Dropout(0.25))
model.add(LSTM(output_dim=embeddings_dim , activation='sigmoid', inner_activation='hard_sigmoid', return_sequences=True))
model.add(Dropout(0.25))
model.add(LSTM(activation='sigmoid', units=embeddings_dim, recurrent_activation='hard_sigmoid', return_sequences=False))
model.add(Dropout(0.25))
model.add(Dense(num_classes))
model.add(Activation('sigmoid'))

adam=keras.optimizers.Adam(lr=0.04)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])

Only that I have too low an accuracy .. with the binary-crossentropy I get a good accuracy, but the results are wrong !!!!! changing to categorical-crossentropy, I get very low accuracy. Do you have any suggestions?

there is my code: GitHubProject - Multi-Label-Text-Classification

2 Answers2

8

In last layer, the activation function you are using is sigmoid, so binary_crossentropy should be used. Incase you want to use categorical_crossentropy then use softmax as activation function in last layer.

Now, coming to the other part of your model, since you are working with text, i would tell you to go for tanh as activation function in LSTM layers.

And you can try using LSTM's dropouts as well like dropout and recurrent dropout

LSTM(units, dropout=0.2, recurrent_dropout=0.2,
                             activation='tanh')

You can define units as 64 or 128. Start from small number and after testing you take them till 1024.

You can try adding convolution layer as well for extracting features or use Bidirectional LSTM But models based Bidirectional takes time to train.

Moreover, since you are working on text, pre-processing of text and size of training data always play much bigger role than expected.

Edited

Add Class weights in fit parameter

class_weights = class_weight.compute_class_weight('balanced',
                                                  np.unique(labels),
                                                  labels)
class_weights_dict = dict(zip(le.transform(list(le.classes_)),
                          class_weights))


model.fit(x_train, y_train, validation_split, class_weight=class_weights_dict)
Upasana Mittal
  • 2,480
  • 1
  • 14
  • 19
  • thank you!! i use categorical_crossentropy because i have multiple class to predict..is correct??? can i use binary for this purpose??? Now i use softmax and tanh but accuracy is still low. How can i use LSTM's dropouts as well like dropout and recurrent dropout??? for the pre-processing i use `embeddings = dict( ) embeddings = gensim.models.KeyedVectors.load_word2vec_format("GoogleNews-vectors-negative300.bin.gz" , binary=True) ` is correct?? – angelo curti giardina Aug 22 '18 at 09:05
  • 1
    @angelocurtigiardina You can use `binary_crossentropy` if you are using `softmax` and check the answer, I have edited. – Upasana Mittal Aug 22 '18 at 09:28
  • have you tried using pre-trained glove and fasttext? @angelocurtigiardina – Upasana Mittal Aug 22 '18 at 09:32
  • really really thank's! with binary the accuracy is high!!!! now i'm testing..then i try with glove and fasttext!!!! really thank's! @UpasanaMittal – angelo curti giardina Aug 22 '18 at 12:36
  • Really bad :( my code is this..tried with word2vec and fasttext, binary and categorical..with binary the accuracy is high but the result are incorrect..what am I doing wrong? ......i'm not able to post my code.. – angelo curti giardina Aug 22 '18 at 14:16
  • @angelocurtigiardina can you tell me size of your data and number of samples per class? And did you add conv layer? – Upasana Mittal Aug 22 '18 at 20:53
  • If you can post your code on github then do let me know. I will check. – Upasana Mittal Aug 22 '18 at 20:53
  • there is my code: https://github.com/ancileddu/multi-label-text-classification .. i can't add the conv layer because the teacher just wants lstm :( really thank you! you're very kind! it's my last university exam and I'm going crazy!!!! – angelo curti giardina Aug 23 '18 at 06:34
  • Please explain me one thing. Why are you training word2vec on negative words but not on the data you are training your model on? – Upasana Mittal Aug 23 '18 at 10:12
  • it's an error..now i've changed with embeddings = gensim.models.Word2Vec(train_texts, min_count=1, size=300)..is correct??? – angelo curti giardina Aug 23 '18 at 11:44
  • You haven't done text cleaning. any particular reason? – Upasana Mittal Aug 23 '18 at 11:52
  • it is necessary? practically I have to remove from the trainset the useless words and the punctuation ?? – angelo curti giardina Aug 23 '18 at 12:03
  • sorry, i'm noob..'tokenizer = Tokenizer(num_words=max_features, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n', lower=True, split=" ")' this cose clean data, no? – angelo curti giardina Aug 23 '18 at 12:33
  • What is the total number of samples per class? – Upasana Mittal Aug 24 '18 at 03:59
  • class 0: 808 - class 1: 1652 - class 2: 1220 - class 3: 1708 - class 4: 969 – angelo curti giardina Aug 24 '18 at 06:28
  • There is class imbalance. I am editing answer with giving class_weights as dict in fit parameter. Please do that as well. – Upasana Mittal Aug 24 '18 at 06:30
  • labels are labels = ["0","1","2","3","4"] right? now the error is: `class_weight` must contain all classes in the data. The classes {0, 1, 2, 3, 4} exist in the data but not in `class_weight`..i'm going crazy.. model.fit(train_sequences, train_labels, validation_split=0.1, class_weight=class_weights_dict) – angelo curti giardina Aug 24 '18 at 07:20
  • I checked your code. and done changes accordingly again. Use it. And I am trying the model with the data provided at my end as well. – Upasana Mittal Aug 24 '18 at 07:28
  • i'm very grateful for your precious help..thank you very much!!!!!! but my model doesn't provide an accurate prediction :( how many epochs should i use? i've tried with 1 and 5.. – angelo curti giardina Aug 24 '18 at 07:41
  • use early callback in fit parameter and try epoch atleast 50 – Upasana Mittal Aug 24 '18 at 07:55
  • @angelocurtigiardina can you please check this https://github.com/upasana-mittal/stackoverflow/blob/master/sentence-classification.py – Upasana Mittal Aug 24 '18 at 08:11
  • Please try changing glove embedding to other as i have tried with twitter one. You can try with wikipedia one.https://nlp.stanford.edu/projects/glove/ and use `100` dimension only. – Upasana Mittal Aug 24 '18 at 08:21
  • this is really really cool!!!!!!!!! thank you very much..now i commit all the mod..you are the best!!!!!!!! now i'm trying the model with 50 epochs..good!!!! – angelo curti giardina Aug 24 '18 at 09:23
4

change:

model.add(Activation('sigmoid'))

to:

model.add(Activation('softmax'))
Ioannis Nasios
  • 8,292
  • 4
  • 33
  • 55