0

Is there any rule, when I like to find cosine similarity between two documents that have different number of words?

vikifor
  • 3,426
  • 4
  • 45
  • 75

1 Answers1

2

The standard formula does not require the number of words to match. You can just sum over the union of the words of both documents. All words that are in B but not in A give rise to a 0 in the word vector for A. All words that are in A but not in B give rise to a 0 in the word vector for B.

Udo Klein
  • 6,784
  • 1
  • 36
  • 61