I have a dataset which has 300000 lines, each line of which is an article title, I want to find features like tf
or tfidf
of this dataset.
I am able to count the words(tf) in this dataset, such as:
WORD FREQUENCE
must 10000
amazing 9999
or word percentage
:
must 0.2
amazing 0.19
but how to caculate idf
, I mean I need to find some features to discriminate this dataset from the others? or HOW DOES tfidf
used in text classification?