I'm building a QA machine. I have a problem that one question maybe have multiple answers, and the answers are located in different position in context. For example:
Question: What does Chris have to do?
Context: ....Chris have to wash dishes....(more text)....Chris have to do his homework....
Correct answers:
- wash dishes
- do homework
When I got the answers out for a question, I use a clustering algorithm to deduplicate and get "separate" answers. Therefore, I need a dataset having some pair of 1 question - many answers like above to evaluate my clustering algorithm and sentence embedding model.
Is there any public dataset that support a pair of one question - multiple correct answers (not duplicated)? I tried MS MARCO but most of multiple answers in this dataset are duplicated.