How to prepare only test dataset from text data?

Question

I am performing ML classification task with LinearSVC with text dataset in python. I have created train and test dataset with Sklearn (train_test_split). The algorithm works proper on my dataset. My question is, I have similar dataset with same classes. I want to test my algorithm with this new dataset. How can I feed my new dataset as a test dataset to my model? How can I make predictions and check accuracy for the same dataset? What could be the format of my dataset to make predictions? So that I can check accuracy on new dataset.

I have checked the test dataset which I have create from train test split. It is in array form. How can I bring my dataset in array format?

Please help me to solve this problem.

You find my code as below. Where 'Topic' is the variable which consists name of the classes. Where 'Text' is the column which consists text data.

data['category_id'] = data['Topic'].factorize()[0]
category_id_data = data[['Topic','category_id']].drop_duplicates().sort_values('category_id')
category_to_id = dict(category_id_data.values)
id_to_category = dict(category_id_data[['category_id', 'Topic']].values)

tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='latin-1', ngram_range=(1, 2))
features = tfidf.fit_transform(data.Text).toarray()
labels = data.category_id

model1 = LinearSVC()
X_train, X_test, y_train, y_test, indices_train, indices_test = 
train_test_split(features, labels, data.index, test_size=0.2, random_state=0)
model1.fit(X_train, y_train)
y_pred = model1.predict(X_test)

You need to apply the same preprocessing as you did for the training and test set. Can you show some minimum code snippet? — Jindřich, Sep 25 '20 at 07:22

How to prepare only test dataset from text data?

0 Answers0