I have rows of blurbs (in text format) and I want to use tf-idf to define the weight of each word. Below is the code:
def remove_punctuations(text):
for punctuation in string.punctuation:
text = text.replace(punctuation, '')
return text
df["punc_blurb"] = df["blurb"].apply(remove_punctuations)
df = pd.DataFrame(df["punc_blurb"])
vectoriser = TfidfVectorizer()
df["blurb_Vect"] = list(vectoriser.fit_transform(df["punc_blurb"]).toarray())
df_vectoriser = pd.DataFrame(x.toarray(),
columns = vectoriser.get_feature_names())
print(df_vectoriser)
All I get is a massive list of numbers, which I am not even sure anymore if its the TF or TF-IDF that it is giving me as the frequent words (the, and, etc) all have a score of more than 0.
The goal is to see the weights in the tf-idf column shown below and I am unsure if I am doing this in the most efficient way: