I am using the code given in most up-voted answer to this question (Similarity between two text documents) to calculate TF-IDF between documents. However, I observe that when I run the code WITHOUT specifying a custom value of min_df
(1, in the code), then if two documents are completely different (such that there is no common word in them), instead of receiving a TF-IDF value of 0, I get the following error:
ValueError: empty vocabulary; training set may have contained only stop words or min_df (resp. max_df) may be too high (resp. too low).
Can somebody tell me how can I get rid of this error?