I have downloaded a .txt which contains 1000's of words with each word assigned a label indicating positive or negative value.The lesser than value is, the more -ve sentiment it represents. It looks like :-
bad,-1
sucks,-2
too good,2
amazing,3
terrible,-2
...
I have named the first column as word
and the second column
as label.
I am training it using :-
vectorizer = TfidfVectorizer(use_idf = True, lowercase=False,strip_accents='ascii', stop_words=stop_words)
y = test_df['label']
X = vectorizer.fit_transform(test_df['word'])
X_train, X_test, y_train, y_test = train_test_split(X, y)
Now, the problem is that since each word is present only one time, so it makes absolutely no sense to predict the label of a word in the untrained part since the word in the untrained part has no relation with the words in the trained part.So,as expected, I am getting quite low accuracy
.So, how are you supposed to use predefined dictionaries of words for sentiment analysis?