Threshold for TF-IDF cosine similarity scores

Asked Aug 24 '16 at 20:41

Active Aug 24 '16 at 20:41

Viewed 1,358 times

This question is very similar to this one: Systematic threshold for cosine similarity with TF-IDF weights

How should I cut off tiny similarities? In the link above, the answer gives a technique based on averages. But this could return documents even if all similarities are very small, for example, < 0.01.

How do I know if a given document query is so unrelated to the corpus that no other document should be considered similar to it? Is there a systematic way to define a cutoff value for this?

edited May 23 '17 at 12:34

Community

asked Aug 24 '16 at 20:41

Guilherme Caminha

Threshold for TF-IDF cosine similarity scores

0 Answers0