I am trying to apply Logistic Regression Models with text.
I Vectorized my data by TFIDF:
vectorizer = TfidfVectorizer(max_features=1500)
x = vectorizer.fit_transform(df['text_column'])
vectorizer_df = pd.DataFrame(x.toarray(), columns=vectorizer.get_feature_names())
df.drop('text_column', axis=1, inplace=True)
result = pd.concat([df, vectorizer_df], axis=1)
I split my data:
x = result.drop('target', 1)
y = result['target']
and finally:
x_raw_train, x_raw_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)
I build a classifier:
classifier = Pipeline([('clf', LogisticRegression(solver="liblinear"))])
classifier.fit(x_raw_train, y_train)
And I get this error:
ValueError: y should be a 1d array, got an array of shape (74216, 2) instead.
This is a strange thing because when I assign max_features=1000 it is working well, but when max_features=1500 I got an error.
Someone can help me please?