0

everyone. For a revision I was asked for a sentimental analysis, I am approaching this methodology for the first time. I have been using NLTK for 2 days now and seriously need help. I was able thanks to the tutorials and thanks to this forum to get to a stemming stage. This is my data at this stage (I've more columns and rows)

| Picture1_stemmed | Video1_stemmed |

| [feel, magnific, england, secular, heritag] | [feel, relax, peac, seren, watch, short, video] | | [power, great, ] | [peac, relax, calm] |

I would like to obtain the total frequency (for the entire dataset) of words.

I've tried this code

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(analyzer=clean_text)
X= cv.fit(df['Picture2_stemmed'])
X = cv.transform(df['Picture2_stemmed'])
X = cv.fit_transform(df['Picture2_stemmed'])
print (X.shape)
(54, 47)
df = pd.DataFrame(X.toarray(), columns=cv.get_feature_names())
df.head()

but i get joined words

| appreciprettipicturhappi | beautiquiet |

| 0 | 0 | | 0 | 0 |

Thank you to anyone who will help me

0 Answers0