I am working on a Point Cloud Project. I have about 90 3D Point Cloud images. After all the preprocessing work on my data, I am using PCA for dimensionality reduction, and the output i get is in 2D (when I plot these principle components I get back my image).
Now, I am trying to feed this principle components to SVM for classification.The code I have so far (which works):
import os
import glob
import pandas as pd
import numpy as np
from sklearn.svm import SVC
path = "/home/me/Desktop/pca-comma-seperated-csv" #path to my pca'd files
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pd.read_csv(f) for f in all_files) #creating dataframe from 90 csv files,each csv file containing 2 columns and upto 23000 columns
X = pd.concat(df_from_each_file, ignore_index=True)
classfile = "/home/me/Desktop/1.csv" #path to my label file
Y = pd.read_csv(classfile, header=1)
clf = SVC()
clf.fit(X, Y)
Now coming to my question, how do I split my data to training (70%) and testing (30%). What (testing or training frame) do I feed to SVM for classification? How do i get a confusion matrix in python like that of in R like in the example below?
Example:
truth
pred abnormal normal
abnormal 231 32
normal 27 54
Accuracy : 0.8285
95% CI : (0.7844, 0.8668)
No Information Rate : 0.75
P-Value [Acc > NIR] : 0.0003097
Kappa : 0.5336
Mcnemar's Test P-Value : 0.6025370
Sensitivity : 0.8953
Specificity : 0.6279
Pos Pred Value : 0.8783
Neg Pred Value : 0.6667
Prevalence : 0.7500
Detection Rate : 0.6715
Detection Prevalence : 0.7645
'Positive' Class : abnormal