Identify and correct mistakes in Q&A datasets for natural language processing (NLP)

Question

How do you identify and correct mistakes in Q&A datasets that contain errors, such as incorrect answers or missing information, and ensure the accuracy of the dataset? Let's say I got thousands of questions+answers that are formed like the Stanford Question Answering Dataset (SQuAD) and I want to double-check every single of them. What are some common methods or best practices for adjusting or correcting Q&A datasets?

For example, if the input text of the context looks like this:

text = "Albert Einstein, (born March 14, 1879, Ulm, Württemberg, Germany—died April 18, 1955, Princeton, New Jersey, U.S.), German-born physicist who developed the special and general theories of relativity and won the Nobel Prize for Physics in 1921 for his explanation of the photoelectric effect. Einstein is generally considered the most influential physicist of the 20th century." `` Output:

Q: Who is generally considered to be the most influential? (miss physicist) A: Albert Einstein Q: What is the photoelectric effect? A: Albert Einstein (Wrong answer)

I examined the QA-dataset as a json-file and attempted to correct it directly, but it is very slow and it is easy to lose track of updating everything in a file.

score 0 · Answer 1 · answered Mar 10 '23 at 20:05

Depending on the skills your dataset requires, you can maybe speed up your manual evaluation by utilizing a trained question-answering model. Just check the huggingface model hub for a suitable model and use it with the QuestionAnsweringPipeline as shown below:

from transformers import pipeline

p = pipeline(model="deepset/roberta-base-squad2")
p(question="Where do I live?", context="My name is Wolfgang and I live in Berlin")

Output:

{'score': 0.9191, 'start': 34, 'end': 40, 'answer': 'Berlin'}

The idea is that you only need to look at the questions with a low score or the questions that were answered incorrectly. But first, you need to check a few of the questions that were answered correctly to make sure that the model is suited for the task.

You could also try to utilize one of the openai models:

Given that: {TEXT} what is the correct answer for: {QUESTION}

or something like that.

Identify and correct mistakes in Q&A datasets for natural language processing (NLP)

1 Answers1