I am trying to fit this model but I get the error AttributeError: 'list' object has no attribute 'lower'
. I know that lower
can only be applied on str, but I can't figure out how to solve this problem.
X and Y are binary vectors for each text in the training and test set. They are lists containing 0 and 1 integers.
So for example, X and Y can look like this:
X = [[1, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, ], [1, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]
Y = ['pos', 'pos', 'pos', 'neg', 'neg', 'neg', 'neg', 'neg']
This is my code:
Y = ["pos"] * len(train_pos_vec) + ["neg"] * len(train_neg_vec)
X = train_pos_vec + train_neg_vec
text_clf = Pipeline([('vect', CountVectorizer(analyzer='word',
ngram_range=(1, 1),
stop_words='english')),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB(alpha=.01))])
text_clf = text_clf.fit(X, Y)
predicted = text_clf.predict(X_test)
print("Unigram Accuracy : {}% \n".format(np.mean(predicted == Y_test) * 100))
Do I fit the model wrong? Is it there another method?