-1

I need to calculate the tf-idf of a phrase eg:"judgment in developing" with a set off documents instead of calculating tf-idf score for individual terms in python

2 Answers2

0

You can compute tf-idf scores for phrases using ngram_range attribute of Scikit-learn's TfidfVectorizer (sklearn.feature_extraction.text.TfidfVectorizer ). If you input ngram range as (1,3) then it will first create vocabulary using not just unigrams(words) but also bigrams and trigrams in the input corpus. Ultimately TfidfVectorizer will output matrix of size (No of terms in vocabulary * No of documents in input corpus). Now you can refer in this matrix for tf -idf of a phrase.

You can go through this nice post for detailed ellaboration https://markhneedham.com/blog/2015/02/15/pythonscikit-learn-calculating-tfidf-on-how-i-met-your-mother-transcripts/

Hope this helps!!!

drp
  • 340
  • 1
  • 13
-1

You could either filter your documents and use only the ones that contain/match words of the query or use your query as a whole string without considering every single word.

lvcasco
  • 45
  • 1
  • 8
  • Answer is too subjective......Not giving any concrete way to achieve what question demands – drp Nov 02 '19 at 19:10