-2

I'm trying to predict values in my dataset with SKlearn SVM. On the Sklearn website, I have more models:

My dataset is full numeric (like the Iris dataset), without labels.

I tried to apply the model in this way :

svclassifier = SVC(kernel='linear')

and the computation is very long (about 19 hours)

I tried to change the model in

svclassifier = SVR()

and the computation is very light( about 2min)

I checked the RMSE too, with my original values and predicted value and the difference is so close, in the SVC is about 6, and for SVR is 5.9 (seems better in this one).

How can you find the right model for the set? What is the difference between the two models I used?

EDIT: This is my dataset type

valueHR values  WkHR    WkCal   WkSteps sec sugar   cal carbs   fat fiber   protein sodium
823 77  0   0   0   0   0   90  0   0   0   0   0   0
824 75  49  0   0   0   0   90  0   0   0   0   0   0

and i split df in this way

X = data.drop('sugar', axis=1)
y = data['sugar']

and I applied a test and train for the X and y

After that, I apply SVM to predict the sugar values.

EDIT 2: the data.unique()

data['sugar'].unique()

array([ 90,  86,  82,  79,  78,  76,  84,  88,  92,  81,  93,  96,  95,
        94,  87,  99,  97,  89, 104, 109, 113, 116, 108,  98,  80,  72,
        73,  74,  83, 112, 107, 103,  91, 100, 102, 101, 105, 117, 110,
       106, 125, 133, 115, 111, 114,  85, 121, 119, 126, 122, 127, 132,
       136, 131, 123, 120, 118, 124, 130, 128, 129, 140, 138, 139, 145,
       154, 148, 134], dtype=int64)

To be clear, I don't want to classify, I just want to predict. In the dataset there are data of the same person, so I do not have more people (as in the case of the Iris dataset there were different species) .

theantomc
  • 619
  • 2
  • 7
  • 32

1 Answers1

1

First thing to note is whether the data is in linear or in non-linear format.

To predict the best model for the beginner, its quite difficult because it needs to analyse the data format(linear,non-linear).

However we can use metrics in order to check the results accuracy.

Use the below code to test accuracy of models which you use on the dataset.

from sklearn.metrics import accuracy_score

accuracy_score(y_test,predicted_y_test)

  • 1
    Svc is a classifier. Classifier is nothing but to classify whether something belongs at particular place depends on previously validated data. – Mohammed Tabrez Mar 25 '20 at 10:39
  • 1
    Svr is a regressor. Regressor is used to find the relationships between a dependent variable and one or more independent variables and then find the upcoming values – Mohammed Tabrez Mar 25 '20 at 10:42
  • How can I tell if the date is linear or not? I have a series of numbers in my dataset, all of which indicate body values, such as temperature, heart rate, etc... – theantomc Mar 25 '20 at 10:57
  • To find whether the data is linear or not check to see if there's a constant rate of change in your dataset, if yes then it is linear data. For more information of data science check through "Machine learning A-Z by kirill erimenko" on udemy. You will get a lot of knowledge on machine learning modules and artificial intelligence. – Mohammed Tabrez Mar 25 '20 at 11:09