testing and training sklearn

Question

I am working on a Point Cloud Project. I have about 90 3D Point Cloud images. After all the preprocessing work on my data, I am using PCA for dimensionality reduction, and the output i get is in 2D (when I plot these principle components I get back my image).

Now, I am trying to feed this principle components to SVM for classification.The code I have so far (which works):

import os
import glob
import pandas as pd
import numpy as np
from sklearn.svm import SVC

path = "/home/me/Desktop/pca-comma-seperated-csv" #path to my pca'd files 
all_files = glob.glob(os.path.join(path, "*.csv"))

df_from_each_file = (pd.read_csv(f) for f in all_files) #creating dataframe from 90 csv files,each csv file containing 2 columns and upto 23000 columns

X   = pd.concat(df_from_each_file, ignore_index=True)
classfile = "/home/me/Desktop/1.csv" #path to my label file
Y = pd.read_csv(classfile, header=1)
clf = SVC()
clf.fit(X, Y)

Now coming to my question, how do I split my data to training (70%) and testing (30%). What (testing or training frame) do I feed to SVM for classification? How do i get a confusion matrix in python like that of in R like in the example below?

Example:

      truth
pred       abnormal normal
abnormal      231     32
normal         27     54

           Accuracy : 0.8285
             95% CI : (0.7844, 0.8668)
No Information Rate : 0.75
P-Value [Acc > NIR] : 0.0003097

              Kappa : 0.5336
 Mcnemar's Test P-Value : 0.6025370

        Sensitivity : 0.8953
        Specificity : 0.6279
     Pos Pred Value : 0.8783
     Neg Pred Value : 0.6667
         Prevalence : 0.7500
     Detection Rate : 0.6715
   Detection Prevalence : 0.7645

   'Positive' Class : abnormal

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html — , Aug 19 '17 at 10:58
Use [`train_test_split()`](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) to split data, feed `X_train`, `y_train` to SVC(), use `X_test` for prediction to get `y_pred` and then use [`confusion_matrix`](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) on `y_test` and `y_pred` to get the Confusion matrix. — Vivek Kumar, Aug 19 '17 at 11:01
Thank you for the quick reply. Will try and revert back if i dont understand anything. — Chaitanya, Aug 19 '17 at 11:22
Yes, thats correct. It should work. If not, then edit the question with the code and problem details. Note that the scikit confusion_matrix method will not return all the details as you have posted R — Vivek Kumar, Aug 19 '17 at 18:07

testing and training sklearn

0 Answers0