4

This question is very similar to this one: Systematic threshold for cosine similarity with TF-IDF weights

How should I cut off tiny similarities? In the link above, the answer gives a technique based on averages. But this could return documents even if all similarities are very small, for example, < 0.01.

How do I know if a given document query is so unrelated to the corpus that no other document should be considered similar to it? Is there a systematic way to define a cutoff value for this?

Community
  • 1
  • 1

0 Answers0