-2

I have a document with, say, 15 tweets. Given a query, how can we rank the tweets from most relevant to the query to least relevant?

That is, let D be the document containing 15 tweets:

D = ['Tweet 1', 'Tweet 2' ..... 'Tweet 15']
Q = "some noun phrase"

Given Q, what method we can use for ranking the tweets from most relevant to least relevant?

All tweets are similar and belong to the same topic. Can I use tf-idf (it's a bad idea, I think), topic modelling?

ObiWan
  • 196
  • 1
  • 12
  • What is "TFIDF"? [frequency–inverse document frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)? Please edit your question to provide this information. – Peter Mortensen Sep 11 '16 at 10:46
  • Edited! Thank you! – ObiWan Sep 11 '16 at 10:53
  • The question is, really, "what is relevant and what is not". Once you know how to define that, then you can implement an algorithm. And defining that is entirely up to you. It is not really a python question, I would say. – zvone Sep 11 '16 at 11:17
  • Finally, I did somehow. Thanks. – ObiWan Sep 16 '16 at 05:54

1 Answers1

0

Yoe need nltk (Natural Language Toolkit) libery. There is built-in function which count tf-idf

Attila
  • 51
  • 3