everyone. For a revision I was asked for a sentimental analysis, I am approaching this methodology for the first time. I have been using NLTK for 2 days now and seriously need help. I was able thanks to the tutorials and thanks to this forum to get to a stemming stage. This is my data at this stage (I've more columns and rows)
| Picture1_stemmed | Video1_stemmed |
| [feel, magnific, england, secular, heritag] | [feel, relax, peac, seren, watch, short, video] | | [power, great, ] | [peac, relax, calm] |
I would like to obtain the total frequency (for the entire dataset) of words.
I've tried this code
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(analyzer=clean_text)
X= cv.fit(df['Picture2_stemmed'])
X = cv.transform(df['Picture2_stemmed'])
X = cv.fit_transform(df['Picture2_stemmed'])
print (X.shape)
(54, 47)
df = pd.DataFrame(X.toarray(), columns=cv.get_feature_names())
df.head()
but i get joined words
| appreciprettipicturhappi | beautiquiet |
| 0 | 0 | | 0 | 0 |
Thank you to anyone who will help me