Questions tagged [tf-idf]

“Term-frequency ⨉ Inverse Document Frequency”, or “tf-idf”, measures how important a word is to a document in a collection or corpus.

“Term-frequency ⨉ Inverse Document Frequency”, or “tf-idf”, in Natural Language Processing (nlp) and text-mining, measures how important a word is to a document in a collection or corpus.

References:

Tf idf - Wikipedia

1326 questions

-1

votes

1 answer

Reverse TF-IDF vector (vec2text)

Given a generated doc2vec vector on some document. is it possible to reverse the vector back to the original document? If so, does there exist any hash algorithm that would make the vector irreversible but still comparable to other vectors of the…

hash data-science tf-idf doc2vec lsh

asked Aug 28 '22 at 09:45

first_question_magnus

-1

votes

1 answer

NotFittedError: The TF-IDF vectorizer is not fitted

I've trained a sentiment analysis classifier using TripAdvisor's textual reviews datasets. It can predict the input textual reviews' rating based on sentiment. Everything is ok with the training and testing. However, when I loaded the classifier in…

python machine-learning scikit-learn tf-idf tfidfvectorizer

asked Aug 10 '22 at 00:32

JK Jin

-1

votes

1 answer

Text transform with sklearn TF-IDF vectorizer generates too big csv file

I have a 1000 texts each text has 200-1000 words. size of text csv file is about 10 MB. when I vectorize them with this code, the size of output CSV is exceptionally big (2.5 GB). I am not sure what I did wrong. Your help is highly appreciated.…

python csv scikit-learn text-processing tf-idf

asked Apr 04 '21 at 16:43

tursunWali

-1

votes

1 answer

cosine_sim between a text and a single column in a dataset

i have a dataset that i have to do lemmarization for it which i did below then i have to find similarity between 1 column "text " with the word " vaccine is deadly" but not sure how to use the cosine similarity function right i tried putting the…

python tf-idf cosine-similarity

asked Jan 14 '21 at 15:16

reem alfouzan

-1

votes

1 answer

How to use TfidfVectorizer if I already have a list of keywords in a python df? What are the correct inputs?

I want to calculate the TF-IDF of keywords for a given genre. These keywords were never part of a text, they were already separated but in a different format. I extracted them from that format and put them into lists. The same with genres I had a df…

pandas scikit-learn nlp tf-idf

asked Oct 26 '20 at 13:07

idontknowmuch

-1

votes

1 answer

How to apply tf-idf to rows of text

I have rows of blurbs (in text format) and I want to use tf-idf to define the weight of each word. Below is the code: def remove_punctuations(text): for punctuation in string.punctuation: text = text.replace(punctuation, '') return…

python machine-learning scikit-learn nlp tf-idf

asked Oct 23 '20 at 08:16

U108456

-1

votes

4 answers

Count frequency of a string individually from query

I want to search for a query from a file named a.java. If my query is String name I want to get the frequency of a string individually from the query from the text file. First I have to count the frequency of String and then name individually and…

java algorithm file hashmap tf-idf

asked Aug 16 '20 at 14:57

Sanzida Sultana

-1

votes

1 answer

My nested for loops are taking so much time while calculating term-frequency

i have a list "total_vocabulary" with all the unique words in a collection of 56 documents. There is another list of list with words of every document "rest_doc". I want to calculate term frequency of each word from "total_vocabulary" in "rest_doc"…

python list for-loop tf-idf

asked Mar 30 '20 at 15:32

unaizhaider

-1

votes

1 answer

KNN for text classification, but train and class have different lengths in R

Hello I am trying to classify text, here is the code df <- read.csv("D:/AS/tokpedprepro.csv") #sampling set.seed(123) df <- df[sample(nrow(df)),] df <- df[sample(nrow(df)),] #Convert to corpus dfCorpus <-…

r text-mining knn tf-idf

asked Jan 12 '20 at 08:42

dikfaj

-1

votes

1 answer

TF-IDF Vectors Example (HELP)

Hey i made 3 different approaches but i can't decide which is the right way to use TF-IDF: The first code does fit and transform to both x_train and x_test separately giving (5000, 94462) (5000, 93007). The second code uses both train and test which…

python tensorflow vector tf-idf tfidfvectorizer

asked Dec 18 '19 at 20:12

JRC_FFC_VPLN

-1

votes

1 answer

N_gram frequency python NTLK

I want to write a function that returns the frequency of each element in the n-gram of a given text. Help please. I did this code fo counting frequency of 2-gram code: from nltk import FreqDist from nltk.util import ngrams def…

python pandas nltk tf-idf countvectorizer

asked Oct 10 '19 at 16:31

Miss

-1

votes

1 answer

How to fix 'int' object is not iterable in TF-IDF freqDict_list

I'm currently coding a TF-IDF program in python. I followed a code from this, however it's not working. The problem is 'int' object is not iterable. Traceback (most recent call last): File "C:/Users/Try Arie/PycharmProjects/TF-IDF/tf-idf.py", line…

python python-3.x tf-idf

asked Jul 17 '19 at 00:28

Try

-1

votes

1 answer

term frequency calculation using python

Finding term frequency for documents in a list using python l=['cat sat besides dog'] I have tried finding the term frequency for each word in the corpus. term freq=(no of times word occurred in document/total number of words in a document). I tried…

python machine-learning nlp tf-idf

asked Jul 12 '19 at 14:32

Yalla ajay babu

-1

votes

1 answer

What dimension reduction techniques can i try on my data (0-1 features+tfidf scores as features) before feeding it into svm

I have about 8000 features measuring a two level response variable i.e. output can belong to class 1 or 0. The 8000 features consist of about 3000 features with 0-1 values and about 5000 features (which are basically words from text data and their…

python machine-learning svm tf-idf feature-selection

asked Jun 20 '19 at 04:57

Sakshi Jajodia

-1

votes

1 answer

Is there a way of removing all the words in the text that are not in other text?

I have a document with many reviews. I am creating a bag-of-words BW using TfidfVectorizer. What I want to do is: I only want to use words in BW that are also in other document D. The document D is a document with positive words. I am using this…

python scikit-learn tf-idf tfidfvectorizer

asked Apr 16 '19 at 04:29

Felipe Oliveira

Prev 1 2 3

…

88 89 Next