Let's say I have many documents with a question and an answer. I want to build an embedding where I can find the most similar documents based on just a new question without an answer but still be able to find similar documents based on the whole document, meaning question and answer.
What would be the best way where I only need one embedding?
I thought of some possible approaches, but here I would need to have two different embeddings:
Split all documents into questions and answers and build two different embeddings. One question- and one answer-embedding. Now, if I want to find the most similar doc for a question, I will just use the question-embedding. When I want to find the most similar doc based on a new doc I will split the new doc and find the most similar vectors in both embeddings and calculate something like an average(question_vec, answer_vec).
I create a question-only-embedding and a whole-doc-embedding. Here I can just use an embedding depending on the task.