SVM How to calculate tf-df of test documents in document classification?

Question

In my SVM, i am using tf-idf on the documents for feature extraction. These tf-idf are calculated on the whole of training documents.

Now when i get a test-document that i want to classify, how do i generate the vector for it ?

I used stemming before calculating tf-idf. I can perform that on test-document too. I have count_of_words for train-documents.

Should i increment count of words that are in the train-document count_of_words for calculating the tf-idf of test-document or should i use it directly ?

Jirka · Accepted Answer · 2013-08-15T08:13:20.123

3

Calculate them the same way as during training but: use idf based on the training documents and tf from the test documents. If you have many new documents coming in, just update the training data time to time and retrain your model.

edited Aug 15 '13 at 08:13

answered Aug 13 '13 at 13:12

Jirka

4,184
30
40

for the tf-idf i require tf and idf. tf can be get from test-doc. now i have idf from train-docs. so i should use that idf only ? – Ashish Negi Aug 14 '13 at 03:58

SVM How to calculate tf-df of test documents in document classification?

1 Answers1