3

I want to create a program to classify text data using SVM. but before that, I have to split the data into train data and test data using StratifiedKFold().

But it ended up with this error :

'Traceback (most recent call last):
  File "C:\Users\Administrator\PycharmProjects\untitled1\main.py", line 115, in <module>
     y_train, y_test = labels[train_index], labels[test_index]
TypeError: only integer scalar arrays can be converted to a scalar index'

How to solve this error in this code?

This is the code running on python 3.7

labels = []
label_np = np.array(labels)

with open(path, encoding='utf-8') as in_file:
    data = csv.reader(in_file)
    for line in data:
        label_ = np.append(label_np, line)

model = SVC(kernel='linear')
total_svm = []
total_mat_svm = np.zeros((2,2))

kf = StratifiedKFold(n_splits=3)
kf.get_n_splits(result_preprocess, label_)

for train_index, test_index in kf.split(result_preprocess, label_):
    # print('Train : ', test_index, 'Test : ', test_index)
    x_train, x_test = result_preprocess[train_index], result_preprocess[test_index]
    y_train, y_test = label_[train_index], label_[test_index]

vectorizer = TfidfVectorizer(min_df=5,
                             max_df=0.8,
                             sublinear_tf=True,
                             use_idf=True)
train_vector = vectorizer.fit_transform(x_train)
test_vector = vectorizer.transform(x_test)

model.fit(x_train, y_train)
hasil_svm = model.predict(x_test)

total_mat_svm = total_mat_svm + confusion_matrix(y_test, hasil_svm)
total_svm = total_mat_svm + sum(y_test==hasil_svm)

print(total_mat_svm)

I expect the result is the classification performance and confusion matrix of classfication.

1 Answers1

3

Please see this answer: numpy array TypeError: only integer scalar arrays can be converted to a scalar index

I suspect that not only result_preprocess, but also labels is a list in your data pipeline. In such a case, the solution is simply to transform labels into a NumPy array, before running your code snippet:

import numpy as np
labels = np.array(labels)
Luca Massaron
  • 1,734
  • 18
  • 25
  • I've tried it like the code like my post above. but now it shows this error : Traceback (most recent call last): File "C:\Users\Administrator\PycharmProjects\untitled1\venv\lib\site-packages\sklearn\utils\validation.py", line 235, in check_consistent_length " samples: %r" % [int(l) for l in lengths]) ValueError: Found input variables with inconsistent numbers of samples: [400, 2] – Aldi Kurniawan Apr 28 '19 at 15:36
  • This is a completely different problem, unrelated to your initial inquiry. You should open a new question for that and provide a replicable example of your problem. By the way, you also shouldn't update the code in your initial question with your current solution, otherwise it will become unrelated to the answers you receive and it won't be useful to anyone having a similar problem to yours in the future. – Luca Massaron Apr 29 '19 at 09:33