0

I am trying to fit this model but I get the error AttributeError: 'list' object has no attribute 'lower'. I know that lower can only be applied on str, but I can't figure out how to solve this problem.

X and Y are binary vectors for each text in the training and test set. They are lists containing 0 and 1 integers.

So for example, X and Y can look like this:

X = [[1, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, ], [1, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]
Y = ['pos', 'pos', 'pos', 'neg', 'neg', 'neg', 'neg', 'neg']

This is my code:

Y = ["pos"] * len(train_pos_vec) + ["neg"] * len(train_neg_vec)
X = train_pos_vec + train_neg_vec
text_clf = Pipeline([('vect', CountVectorizer(analyzer='word',
                                                  ngram_range=(1, 1),
                                                  stop_words='english')),
                         ('tfidf', TfidfTransformer()),
                         ('clf', MultinomialNB(alpha=.01))])

text_clf = text_clf.fit(X, Y)
predicted = text_clf.predict(X_test)
print("Unigram Accuracy : {}% \n".format(np.mean(predicted == Y_test) * 100))

Do I fit the model wrong? Is it there another method?

Mr. Wizard
  • 1,093
  • 1
  • 12
  • 19
  • 2
    where do you use the .lower? The error message says that you are using it on a list. So my guess is you are not iterating over the list and applying it on the string item? Or maybe the wrong list, wich would be X as it contains lists? – modmoto Jun 02 '18 at 18:14
  • 1
    I don't use `lower`. The error comes from here: `File "C:\Users\MyWizard\PycharmProjects\classifier\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 232, in return lambda x: strip_accents(x.lower())`. – Mr. Wizard Jun 02 '18 at 18:21
  • 2
    If your input are integers, why are you suing text mining feature selection? – rafaelc Jun 02 '18 at 18:23
  • 1
    Use `text_clf = Pipeline([ ('tfidf', TfidfTransformer()), ('clf', MultinomialNB(alpha=.01)) ])` and it should work as expected. – cs95 Jun 02 '18 at 18:26
  • 1
    Oh yeaaah, now I get it, I can't obtain n-grams on binary data, I am such a newbie. Thank you, now I understand what I did wrong. – Mr. Wizard Jun 02 '18 at 18:34

0 Answers0