0

I am performing ML classification task with LinearSVC with text dataset in python. I have created train and test dataset with Sklearn (train_test_split). The algorithm works proper on my dataset. My question is, I have similar dataset with same classes. I want to test my algorithm with this new dataset. How can I feed my new dataset as a test dataset to my model? How can I make predictions and check accuracy for the same dataset? What could be the format of my dataset to make predictions? So that I can check accuracy on new dataset.

I have checked the test dataset which I have create from train test split. It is in array form. How can I bring my dataset in array format?

Please help me to solve this problem.

You find my code as below. Where 'Topic' is the variable which consists name of the classes. Where 'Text' is the column which consists text data.

data['category_id'] = data['Topic'].factorize()[0]
category_id_data = data[['Topic','category_id']].drop_duplicates().sort_values('category_id')
category_to_id = dict(category_id_data.values)
id_to_category = dict(category_id_data[['category_id', 'Topic']].values)

tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='latin-1', ngram_range=(1, 2))
features = tfidf.fit_transform(data.Text).toarray()
labels = data.category_id

model1 = LinearSVC()
X_train, X_test, y_train, y_test, indices_train, indices_test = 
train_test_split(features, labels, data.index, test_size=0.2, random_state=0)
model1.fit(X_train, y_train)
y_pred = model1.predict(X_test)

0 Answers0