NLP : Get 5 best candidates from QuestionAnsweringPipeline

Question

I am working on a French Question-Answering model using huggingface transformers library. I'm using a pre-trained CamemBERT model which is very similar to RoBERTa but is adapted to french.

Currently, i am able to get the best answer candidate for a question on a text of my own, using the QuestionAnsweringPipeline from the transformers library.

Here is an extract of my code.

QA_model = "illuin/camembert-large-fquad"
CamTokQA = CamembertTokenizer.from_pretrained(QA_model)
CamQA = CamembertForQuestionAnswering.from_pretrained(QA_model)

device_pipeline = 0 if torch.cuda.is_available() else -1
q_a_pipeline = QuestionAnsweringPipeline(model=CamQA,
                                         tokenizer=CamTokQA,
                                         device=device_pipeline)

ctx = open("text/Sample.txt", "r").read()
question = 'Quel est la taille de la personne ?'
res = q_a_pipeline({'question': question, 'context': ctx})
print(res)

I am currently getting this :{'score': 0.9630325870663725, 'start': 2421, 'end': 2424, 'answer': '{21'} , which is wrong.

Therefore, i would like to get the 5 best candidates for the answer. Does anyone have an idea how to do that ?

chefhose · Accepted Answer · 2020-06-26T12:02:08.580

When calling your pipeline, you can specify the number of results via the topk argument. For example for the five most probable answers do:

res = q_a_pipeline({'question': question, 'context': ctx}, topk=5)

This will result in a list of dictionaries: [{'score': 0.0013586128421753108, 'start': 885, 'end': 896, 'answer': "L'ingénieur"}, {'score': 0.0011120906285982946, 'start': 200, 'end': 209, 'answer': 'français.'}, {'score': 0.00010808186718235663, 'start': 164, 'end': 209, 'answer': 'ingénieur hydraulicien et essayiste français.'}, {'score': 5.0453970530228015e-05, 'start': 153, 'end': 209, 'answer': 'urbaniste, ingénieur hydraulicien et essayiste français.'}, {'score': 4.455333667193265e-05, 'start': 190, 'end': 209, 'answer': 'essayiste français.'}]

When you look at the code, you can see QuestionAnsweringPipeline accepts an argument called topk.

NLP : Get 5 best candidates from QuestionAnsweringPipeline

1 Answers1