11

Here is a paste of the code: SVM sample code

I checked out a couple of the other answers to this problem...and it seems like this specific iteration of the problem is a bit different.

First off, my inputs are normalized, and I have five inputs per point. The values are all reasonably sized (healthy 0.5s and 0.7s etc--few near zero or near 1 numbers).

I have about 70 x inputs corresponding to their 70 y inputs. The y inputs are also normalized (they are percentage changes of my function after each time-step).

I initialize my SVR (and SVC), train them, and then test them with 30 out-of-sample inputs...and get the exact same prediction for every input (and the inputs are changing by reasonable amounts--0.3, 0.6, 0.5, etc.). I would think that the classifier (at least) would have some differentiation...

Here is the code I've got:

# train svr

my_svr = svm.SVR()
my_svr.fit(x_training,y_trainr)

# train svc

my_svc = svm.SVC()
my_svc.fit(x_training,y_trainc)


# predict regression

p_regression = my_svr.predict(x_test)
p_r_series = pd.Series(index=y_testing.index,data=p_regression)

# predict classification

p_classification = my_svc.predict(x_test)
p_c_series = pd.Series(index=y_testing_classification.index,data=p_classification)

And here are samples of my inputs:

x_training = [[  1.52068627e-04   8.66880301e-01   5.08504362e-01   9.48082047e-01
7.01156322e-01],
              [  6.68130520e-01   9.07506250e-01   5.07182647e-01   8.11290634e-01
6.67756208e-01],
              ... x 70 ]

y_trainr = [-0.00723209 -0.01788079  0.00741741 -0.00200805 -0.00737761  0.00202704 ...]

y_trainc = [ 0.  0.  1.  0.  0.  1.  1.  0. ...]

And the x_test matrix (5x30) is similar to the x_training matrix in terms of magnitudes and variance of inputs...same for y_testr and y_testc.

Currently, the predictions for all of the tests are exactly the same (0.00596 for the regression, and 1 for the classification...)

How do I get the SVR and SVC functions to spit out relevant predictions? Or at least different predictions based on the inputs...

At the very least, the classifier should be able to make choices. I mean, even if I haven't provided enough dimensions for regression...

Chris
  • 28,822
  • 27
  • 83
  • 158
  • You'll need to provide a self-contained, runnable example with sample data that actually demonstrates the problem. – BrenBarn Dec 26 '15 at 21:12
  • Alright. One sec (or like 10 min =) – Chris Dec 26 '15 at 21:13
  • @BrenBarn there is a link to a pastebin of the code. I included the full data... – Chris Dec 26 '15 at 21:35
  • @bordeo, would you help me with this dear? http://stackoverflow.com/questions/40357805/prediction-time-series-prediction-of-future-events-using-svr-module – Mahsolid Nov 01 '16 at 14:04

3 Answers3

9

Try increasing your C from the default. It seems you are underfitting.

my_svc = svm.SVC(probability=True, C=1000)
my_svc.fit(x_training,y_trainc)

p_classification = my_svc.predict(x_test)

p_classification then becomes:

array([ 1.,  0.,  1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,
        1.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.])

For the SVR case you will also want to reduce your epsilon.

my_svr = svm.SVR(C=1000, epsilon=0.0001)
my_svr.fit(x_training,y_trainr)

p_regression = my_svr.predict(x_test)

p_regression then becomes:

array([-0.00430622,  0.00022762,  0.00595002, -0.02037147, -0.0003767 ,
        0.00212401,  0.00018503, -0.00245148, -0.00109994, -0.00728342,
       -0.00603862, -0.00321413, -0.00922082, -0.00129351,  0.00086844,
        0.00380351, -0.0209799 ,  0.00495681,  0.0070937 ,  0.00525708,
       -0.00777854,  0.00346639,  0.0070703 , -0.00082952,  0.00246366,
        0.03007465,  0.01172834,  0.0135077 ,  0.00883518,  0.00399232])

You should look to tune your C parameter using cross validation so that it is able to perform best on whichever metric matters most to you. You may want to look at GridSearchCV to help you do this.

David Maust
  • 8,080
  • 3
  • 32
  • 36
  • Ok...awesome, thanks--got the classification working. The SVR is still acting up...But it looks like I'm not doing anything wrong, so this should set me on the right track. Do you think scipy's minimize will do the trick? Anyhow, do you know PCA? Will running that improve the situation? (I guess I would need twice as much training data though...and that might push me back too far in time...) – Chris Dec 26 '15 at 21:45
  • Just added an edit for the SVR case. PCA probably won't help you. First try tuning parameters using `GridSearchCV`, then you can decide if you need more data. – David Maust Dec 26 '15 at 21:46
  • Actually, a good way to see if more data will help is to plot a learning curve where you vary the amount of data and measure both training and CV loss. – David Maust Dec 26 '15 at 21:48
  • 1
    Oh. Also since you are using a kernel, you might also want to tune `gamma`. That effect can be pretty dramatic. – David Maust Dec 26 '15 at 22:05
3

I had the same issue, but a completely different cause, and therefore a completely different place to look for a solution.

If your prediction inputs are scaled incorrectly for any reason, you can experience the same symptoms found here. This could be forgetting (or miscoding) the scaling of input values in a later prediction, or due to the inputs being in the wrong order.

0

In my case I needed to scale my data using the StandardScaler in the sklearn package.

Also I had to scale each set of features independently in my case two types of distances each scaled individually.

from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
ss.fit(X[:,0:10])
X[:,0:10] = ss.transform(X[:,0:10])
ss = StandardScaler()
ss.fit(X[:,10:20])
X[:,10:20] = ss.transform(X[:,10:20])
John
  • 633
  • 6
  • 10