I have dataset with 4000 features and 35 samples. All the features are floating point numbers between 1 and 3. eg: 2.68244527684596.
I'm struggling to get any classifier working on this data. I have used knn, svm (with linear,rbf,poly). Then I have learnt about normalization. Still, it's a bit complex for me and I cannot get this code working and giving me proper prediction.
The code I'm using to normalize data is:
train_data = preprocessing.scale(train_data)
train_data = preprocessing.normalize(train_data,'l1',0)
The code I'm trying to classify with is:
# SVM with poly
svc1 = svm.SVC(kernel='poly',degree=3)
svc1.fit(train_data[:-5], train_labels[:-5])
print "Poly SVM: ",svc1.predict(train_data[-5:])
# SVM with rbf
svc2 = svm.SVC(kernel='rbf')
svc2.fit(train_data[:-5], train_labels[:-5])
print "RBF SVM: ",svc2.predict(train_data[-5:])
#SVM with linear
svc3 = svm.SVC(kernel='linear')
svc3.fit(train_data[:-5], train_labels[:-5])
print "Linear SVM: ",svc3.predict(train_data[-5:])
# KNN
knn = KNeighborsClassifier()
knn.fit(train_data[:-5], train_labels[:-5])
print "KNN :", knn.predict(train_data[-5:])
# Linear regression
logistic = linear_model.LogisticRegression()
print('LogisticRegression score: %f' % logistic.fit(train_data[5:], train_labels[5:]).score(train_data[0:4], train_labels[0:4]))
I'm a newbie to machine learning and I'm working hard to learn more about all the concepts. I thought someone might point me in the right direction.
Note: I have only 35 samples and this is part of an assignment. I cannot get more data :(