Questions tagged [tf-idf]

“Term-frequency ⨉ Inverse Document Frequency”, or “tf-idf”, measures how important a word is to a document in a collection or corpus.

“Term-frequency ⨉ Inverse Document Frequency”, or “tf-idf”, in Natural Language Processing () and , measures how important a word is to a document in a collection or corpus.

References:

1326 questions
-2
votes
1 answer

Ranking tweets from most relevant to least relevant in a document using Python

I have a document with, say, 15 tweets. Given a query, how can we rank the tweets from most relevant to the query to least relevant? That is, let D be the document containing 15 tweets: D = ['Tweet 1', 'Tweet 2' ..... 'Tweet 15'] Q = "some noun…
ObiWan
  • 196
  • 1
  • 12
-3
votes
1 answer

TF-IDF function

I need to implement a tf-idf function in spypark's (Databricks) python. I have a csv file (named 'somefile'), and I need the tf-idf of the every word in in the column 'text' (so there should be a cleaning of text first, and also not having…
-3
votes
1 answer

Different results for same test data with trained model

We have loaded trained model using joblib in python and test set of different sizes were given as input for prediction. For eg. we named test set as S1,S2 where S1 has 100 instances and S2 has 1000 instances. The instance 'X' is part of both S1 and…
-3
votes
3 answers

what methods are there to classify documents?

I am trying to do document classification. But I am really confused between feature selections and tf-idf. Are they the same or two different ways of doing classification? Hope somebody can tell me? I am not really sure that my question will make…
-3
votes
1 answer

looking for a java library with a simple to calculate tf–idf, term frequency–inverse document frequency

I need to calculate tf-idf for a set of documents and am looking for a java library that does this. NOTE: I am aware of Mahout but I really want is a library with a simple interface and one that does not require infrastructure setup.
user1172468
  • 5,306
  • 6
  • 35
  • 62
-4
votes
1 answer

what is the difference between 'term frequency' and 'document frequency'?

EDIT: this is the question I ultimately was trying to ask: Understanding min_df and max_df in scikit CountVectorizer I was reading the documentation for the scikit-learn CountVectorizer, and noticed that when discussing max_df, we are concerned with…
Monica Heddneck
  • 2,973
  • 10
  • 55
  • 89
1 2 3
88
89