9

I'm facing the following problem, I'm running a SVR from the scikit-learn library on a training set with about 46500 obsevations and it runs more than six hours, until now.

I'm using the linear kernel.

def build_linear(self):
    model = SVR(kernel='linear', C=1)
    return model

I already tried changing the "C" value between 1e-3 and 1000 nothing changes.

The poly kernel runs in about 5 minutes, but I need the values for an evaluation and can skip this part...

Does anyone got an idea how to speed this up?

Thanks a lot!

Tobias Schäfer
  • 1,330
  • 2
  • 17
  • 35

1 Answers1

14

SVMs are known to scale badly with the number of samples!

Instead of SVR with a linear-kernel, use LinearSVR or for huge data: SGDClassifier

LinearSVR is more restricted in terms of what it can compute (no non-linear kernels) and more restricted algorithms usually have more assumptions and use these to speed-up things (or save memory).

SVR is based on libsvm, while LinearSVR is based on liblinear. Both are well-tested high-quality implementations.

(It might be valuable to add: don't waste time in general cases like these waiting 6 hours. Sub-sample your data and try smaller, less small, ... examples and deduce runtime or problems from that. edit: it seems you did that already, good!).

sascha
  • 32,238
  • 6
  • 68
  • 110
  • It works! A smaller data set, with SVR running > 30 minutes, ran in 1 minute. – Tobias Schäfer Nov 23 '17 at 17:01
  • Thanks for the hint! – Tobias Schäfer Nov 23 '17 at 17:03
  • 1
    @sascha Im using `SVR(kernel = "poly", C = 1e3, degree = 2)` , `SVR(kernel = "linear", C = 1e3)` and `SVR(kernel = "rbf", C = 1e3, gamma = 0.1)` for csv file with 6 columns and 30 rows, and im waiting more than 15 minutes. Why does this take this long? – taga May 18 '19 at 00:36