How to calculate the tf-idf score for a phrase with a set of documents

Question

I need to calculate the tf-idf of a phrase eg:"judgment in developing" with a set off documents instead of calculating tf-idf score for individual terms in python

score 0 · Answer 1 · answered Jun 25 '19 at 11:08

You can compute tf-idf scores for phrases using ngram_range attribute of Scikit-learn's TfidfVectorizer (sklearn.feature_extraction.text.TfidfVectorizer ). If you input ngram range as (1,3) then it will first create vocabulary using not just unigrams(words) but also bigrams and trigrams in the input corpus. Ultimately TfidfVectorizer will output matrix of size (No of terms in vocabulary * No of documents in input corpus). Now you can refer in this matrix for tf -idf of a phrase.

You can go through this nice post for detailed ellaboration https://markhneedham.com/blog/2015/02/15/pythonscikit-learn-calculating-tfidf-on-how-i-met-your-mother-transcripts/

Hope this helps!!!

score -1 · Answer 2 · answered Jul 13 '17 at 17:44

-1

You could either filter your documents and use only the ones that contain/match words of the query or use your query as a whole string without considering every single word.

answered Jul 13 '17 at 17:44

lvcasco

45
1
8

Answer is too subjective......Not giving any concrete way to achieve what question demands – drp Nov 02 '19 at 19:10

How to calculate the tf-idf score for a phrase with a set of documents

2 Answers2