What vector space is Rocchio algorithm computed in?

Question

I have been trying to implement the Rocchio algorithm and I understand the basic idea behind the algorithm but I struggle to put it into concrete terms. I calculated tf_idf before and that is a vector of length of the number of query terms we search for each document that contains at least one of the query terms. But now, I feel like I cannot represent the document as a vector in the space formed just by the query terms because that will not allow me to "discover" other terms that the relevant documents have in common. Should I then represent the vector of the query and vectors of the documents in a vector space of all the tokens found in the currently returned set of documents?

yes the dimension of the vectors (both docs and queries) is the vocabulary size of the collection... so these vectors are extremely sparse (most entries being zeroes)... — Debasis, Mar 18 '20 at 09:18

score 0 · Answer 1 · answered Mar 18 '20 at 16:39

0

Blockquote yes the dimension of the vectors (both docs and queries) is the vocabulary size of the collection... so these vectors are extremely sparse (most entries being zeroes)...

Yes, as @Debasis said this was the correct answer.

answered Mar 18 '20 at 16:39

vcucu

184
3
12

What vector space is Rocchio algorithm computed in?

1 Answers1