20

I see that in scikit-learn I can build an SVM classifier with linear kernel in at last 3 different ways:

Now, I see that the difference between the first two classifiers is that the former is implemented in terms of liblinear and the latter in terms of libsvm.

How the first two classifiers differ from the third one?

JackNova
  • 3,911
  • 5
  • 31
  • 49

1 Answers1

22

The first two always use the full data and solve a convex optimization problem with respect to these data points.

The latter can treat the data in batches and performs a gradient descent aiming to minimize expected loss with respect to the sample distribution, assuming that the examples are iid samples of that distribution.

The latter is typically used when the number of samples is very big or not ending. Observe that you can call the partial_fit function and feed it chunks of data.

Hope this helps?

eickenberg
  • 14,152
  • 1
  • 48
  • 52
  • One problem is that if we do SDG with Hinge Loss, it gives a basic performance benchmark like SVC with linear Kernel only which in terms of performance similar to Logistic Regression with logloss. But Logistic Regression scale a little faster than LinearSVC. So no point of using SVC – Mayukh Sarkar Oct 09 '18 at 10:36
  • @MayukhSarkar This might be true for your data but isn't always the case. Better be on the safe side and test through the different classifiers. – Philipp Oct 31 '19 at 08:22