I'm currently working on multi-label classification task for text data. I have a dataframe with an ID column, text column and several columns which are text label containing only 1 or 0.
I used an existing solution proposed on this website Kaggle Toxic Comment Classification using Bert which permits to express in percentage its degree of belonging to each label.
Now, that I've train my model I would like to test it on few text extracts with no label in order to obtain percentage of belonging to each label :
I've tried this solution :
def getPrediction(in_sentences):
label = ['S1, S2, S3']
input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label=label) for x in in_sentences]
input_features = run_classifier.convert_examples_to_features(input_examples, LABEL_COLUMNS, MAX_SEQ_LENGTH, tokenizer)
predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
predictions = estimator.predict(predict_input_fn)
return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]
pred_sentences = [
"here is an exemple of sentence"]
pred_sentences = ''.join(pred_sentences)
predictions = getPrediction(pred_sentences)
And I got :
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-490-770bf0871d3e> in <module>
----> 1 predictions = getPrediction(pred_sentences)
<ipython-input-486-3de7328d60db> in getPrediction(in_sentences)
2 label = ['S1','S2',
3 'S3']
----> 4 input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, labels=label) for x in in_sentences]
5 input_features = run_classifier.convert_examples_to_features(input_examples, LABEL_COLUMNS, MAX_SEQ_LENGTH, tokenizer)
6 predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
<ipython-input-486-3de7328d60db> in <listcomp>(.0)
2 label = ['S1,
3 S2,S3']
----> 4 input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, labels=label) for x in in_sentences]
5 input_features = run_classifier.convert_examples_to_features(input_examples, LABEL_COLUMNS, MAX_SEQ_LENGTH, tokenizer)
6 predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
TypeError: __init__() got an unexpected keyword argument 'labels'
Any idea what I need to change to make the last part of my algorithm functional?