Is there any case when IDF is better than TF-IDF? As far I understood TF is important to give a weight to a word within a document to match that document with a predefined query. If I'd like just to sort the importance of words in a collection of documents without any specific IR purpose, why should I use the TF term?
Asked
Active
Viewed 2,044 times
0
-
If you say "without any specific IR purpose", then one could just as easily ask "why should I use x?", where x could be tf, idf, or anything else. – Chthonic Project Mar 05 '15 at 19:29
-
I would like to compress the set of words in my documents based on the IDF score (words that have small IDF are removed) – gabboshow Mar 05 '15 at 22:45
-
If that is your purpose, then of course, only idf is required. – Chthonic Project Mar 05 '15 at 23:00
-
this question would be better on crossvalidated or datascience SE – 3pitt Jan 18 '18 at 22:15
1 Answers
1
TF in TF-IDF means frequency of a term in a document. In other words, TF-IDF is a measure for both the term and the document. Here is a good illustration of what I mean.
As far as I understand your case, you don't work with any particular document, instead you want to have some integral characteristic for each word over the whole document collection. So, you should use IDF (or simply DF, document frequency), if you want to find something like stop-words. See also for related question.

Community
- 1
- 1

Nikita Astrakhantsev
- 4,701
- 1
- 15
- 26