1

I'm very new to Python and I'm trying to replicate this Sign Language Glove project heree with my own hardware for a first practice into Machine Learning. I could already write data in CSV files from my accelerometers, but I can't understand the process. The file named 'modeling' confuses me. Can anyone help me understand what are the processes happening?

import numpy as np
from sklearn import svm
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

import pandas as pd
df= pd.read_csv("final.csv") ##This I understand. I've successfully created csv files with data


#########################################################################
## These below, I do not.

from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size = 0.2)


train_features = train[['F1','F2','F3','F4','F5','X','Y','Z','C1','C2']]
train_label = train.cl

test_features = test[['F1','F2','F3','F4','F5','X','Y','Z','C1','C2']]
test_label = test.cl

## SVM
model = svm.SVC(kernel='linear', gamma=1, C=1)
model.fit(train_features, train_label)
model.score(train_features, train_label)
predicted_svm = model.predict(test_features)
print "svm"
print accuracy_score(test_label, predicted_svm)
cn =confusion_matrix(test_label, predicted_svm)

1 Answers1

0

Welcome to the community. That looks like a nice way to start off.

Like @hilverts_drinking_problem suggested, I would recommend looking at sklearn documentation. But here's a quick explanation of what's going on.

The train, test split function randomly splits the dataset into two datasets for the sake of training and testing. test_size = 0.2 means 20% of the data will be in the test set, remaining 80% in train.

The next two lines are just separating out the inputs (features) and outputs (targets) for training. Same for test in the next two lines.

Finally, you create an SVM object, train the model using model.fit, and get its score using .score. You then use the model to predict stuff for the test set. Finally, you print the accuracy score for your test set, along with its confusion matrix.

If you need me to clarify/detail something, let me know!

Derek Langley
  • 172
  • 1
  • 11
  • Also, I should mention there are slightly less verbose way of writing the same code, but this is a good starting point for a beginner. – Derek Langley Aug 16 '19 at 06:25
  • Okay now it's clear. Thank you very much. I still can't make the github project function so I've decided to build my own from scratch. My Python knowledge is very futile. You can view my concern here. https://stackoverflow.com/questions/57543688/how-to-perform-time-series-data-pattern-detection . I really hope you can help me. – defineastronaut Aug 18 '19 at 10:33
  • Hi! I think the question you mentioned there is too broad. Please narrow it down and include a but of your own code to show what you have tried. Its considered bad practice to ask for solution without providing a base to start with. Also, if my answer was satisfactory, please click the tick mark for future programmers. Have a great day! – Derek Langley Aug 19 '19 at 20:45