2

I am attempting to use MultinomialNB from sklearn to classify some data. I have made a sample csv with some labelled training data, which I want to use to train the model but I receive the following error message:

ValueError: Expected 2D array, got 1D array instead: array=[0 1 2 2].

I know it is a very small data set but I will eventually add more data once the code is working.

Here is my data: enter image description here

Here is my code:

import numpy as np
import pandas as pd
import array as array
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
data_file = pd.read_csv("CSV_Labels.csv", engine='python')
data_file.tail()
vectorizer = CountVectorizer(stop_words='english')
all_features = vectorizer.fit_transform(data_file.Word)
all_features.shape
x_train = data_file.label
y_train = data_file.Word
x_train.values.reshape(1, -1)
y_train.values.reshape(1, -1)
classifer = MultinomialNB()
classifer.fit(x_train, y_train)
desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

0

Try this:

x_train = x_train.values.reshape(-1, 1)
y_train = y_train.values.reshape(-1, 1)

numpy reshape operations are not inplace. So the array's you're passing to the classifier have actual the old shapes.

Tinu
  • 2,432
  • 2
  • 8
  • 20
  • Hello Tinu Thanks so much for your help! Sorry it was such a silly question. I've amended the code but I am now getting the following error: ValueError: bad input shape (1, 4). Could it be an issue with the format of my csv? – user3062448 Nov 16 '20 at 13:55
  • No problem, you're welcome. Actually I made a mistake there. The way you reshape the vectors makes them row vectors, but the classifier likes to have columns vectors. I edited my answer accordingly. – Tinu Nov 16 '20 at 14:31