I am having a training data set of 144 student feedback with 72 positive and 72 negative feedback respectively. The data set has two attributes namely data and target which contain the sentence and the sentiment(positive or negative) respectively. The testing data set contains 106 unlabeled feedback. Consider the following code:
import pandas as pd
feedback_data = pd.read_csv('output_svm.csv')
print(feedback_data)
data target
0 facilitates good student teacher communication. positive
1 lectures are very lengthy. negative
2 the teacher is very good at interaction. positive
3 good at clearing the concepts. positive
4 good at clearing the concepts. positive
5 good at teaching. positive
6 does not shows test copies. negative
7 good subjective knowledge. positive
8 good communication skills. positive
9 good teaching methods. positive
10 posseses very good and thorough knowledge of t... positive
feedback_data_test = pd.read_csv('classified_feedbacks_test.csv')
print(feedback_data_test)
data target
0 good teaching. NaN
1 punctuality. NaN
2 provides good practical examples. NaN
3 weak subject knowledge. NaN
4 excellent teacher. NaN
5 no strength. NaN
6 very poor communication skills. NaN
7 not able to clear the concepts. NaN
8 punctual. NaN
9 lack of proper guidance. NaN
10 fantastic speaker. NaN
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(binary = True)
ct = CountVectorizer(binary= True)
cv.fit(feedback_data['data'].values)
ct.fit(feedback_data_test['data'].values)
X = feedback_data['data'].apply(lambda X : cv.transform([X])).values
X = list([list(x.toarray()[0]) for x in X])
X_test = feedback_data_test['data'].apply(lambda X_test : ct.transform([X_test])).values
X_test = list([list(x.toarray()[0]) for x in X_test])
from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
target = [1 if i<72 else 0 for i in range(144)]
X_train, X_val, y_train, y_val = train_test_split(X, target, train_size = 0.50)
clf = svm.SVC(kernel = 'linear', gamma = 0.001, C = 0.05)
clf.fit(X, target)
#The below line gives error
print("Accuracy = %s" %accuracy_score(target,clf.predict([X_test])) )
I do not know what is wrong. Please help.