11

I should decide between SVM and neural networks for some image processing application. The classifier must be fast enough for near-real-time application and accuracy is important too. Since this is a medical application, it is important that the classifier has the low failure rate.

which one is better choice?

doug
  • 69,080
  • 24
  • 165
  • 199
Lily
  • 315
  • 2
  • 6
  • 16
  • 1
    Umm. Neither is fast compared to basic classifiers like KNN. How big are your feature vectors, what language are you using, how much training data do you have? Also, you're not very clear on whether you want online learning or not. – Junuxx May 20 '12 at 16:06
  • 4
    KNN is not faster. It has no training phase, that's right. But it's a lazy classifier, which means that its prediction phase is very slow. – alfa May 20 '12 at 18:21
  • Take a look at [ELM](http://stats.stackexchange.com/questions/81499/are-bias-weights-essential-in-the-output-layer-if-one-wants-a-universal-functio) –  Jan 07 '14 at 14:05

1 Answers1

30

A couple of provisos:

performance of a ML classifier can refer to either (i) performance of the classifier itself; or (ii) performance of the predicate step: execution speed of the model-building algorithm. Particularly in this case, the answer is quite different depending on which of the two is intended in the OP, so i'll answer each separately.

second, by Neural Network, i'll assume you're referring to the most common implementation--i.e., a feed-forward, back-propagating single-hidden-layer perceptron.

Training Time (execution speed of the model builder)

For SVM compared to NN: SVMs are much slower. There is a straightforward reason for this: SVM training requires solving the associated Lagrangian dual (rather than primal) problem. This is a quadratic optimization problem in which the number of variables is very large--i.e., equal to the number of training instances (the 'length' of your data matrix).

In practice, two factors, if present in your scenario, could nullify this advantage:

  • NN training is trivial to parallelize (via map reduce); parallelizing SVM training is not trivial, but it's also not impossible--within the past eight or so years, several implementations have been published and proven to work (https://bibliographie.uni-tuebingen.de/xmlui/bitstream/handle/10900/49015/pdf/tech_21.pdf)

  • mult-class classification problem SVMs are two-class classifiers.They can be adapted for multi-class problems, but this is never straightforward because SVMs use direct decision functions. (An excellent source for modifying SVMs to multi-class problems is S. Abe, Support Vector Machines for Pattern Classification, Springer, 2005). This modification could wipe out any performance advantage SVMs have over NNs: So for instance, if your data has more than two classes and you chose to configure the SVM using successive classificstaion (aka one-against-many classification) in which data is fed to a first SVM classifier which classifiers the data point either class I or other; if the class is other then the data point is fed to a second classifier which classifies it class II or other, etc.

Prediction Performance (execution speed of the model)

Performance of an SVM is substantially higher compared to NN. For a three-layer (one hidden-layer) NN, prediction requires successive multiplication of an input vector by two 2D matrices (the weight matrices). For SVM, classification involves determining on which side of the decision boundary a given point lies, in other words a cosine product.

Prediction Accuracy

By "failure rate" i assume you mean error rate rather than failure of the classifier in production use. If the latter, then there is very little if any difference between SVM and NN--both models are generally numerically stable.

Comparing prediction accuracy of the two models, and assuming both are competently configured and trained, the SVM will outperform the NN.

The superior resolution of SVM versus NN is well documented in the scientific literature. It is true that such a comparison depends on the data, the configuration, and parameter choice of the two models. In fact, this comparison has been so widely studied--over perhaps all conceivable parameter space--and the results so consistent, that even the existence of a few exceptions (though i'm not aware of any) under impractical circumstances shouldn't interfere with the conclusion that SVMs outperform NNs.

Why does SVM outperform NN?

These two models are based on fundamentally different learing strategies.

In NN, network weights (the NN's fitting parameters, adjusted during training) are adjusted such that the sum-of-square error between the network output and the actual value (target) is minimized.

Training an SVM, by contrast, means an explicit determination of the decision boundaries directly from the training data. This is of course required as the predicate step to the optimization problem required to build an SVM model: minimizing the aggregate distance between the maximum-margin hyperplane and the support vectors.

In practice though it is harder to configure the algorithm to train an SVM. The reason is due to the large (compared to NN) number of parameters required for configuration:

  • choice of kernel

  • selection of kernel parameters

  • selection of the value of the margin parameter

doug
  • 69,080
  • 24
  • 165
  • 199
  • 2
    How is NN training a MapReduce problem? – Fred Foo Jul 07 '14 at 08:40
  • @FredFoo features map to the hidden layer nodes, and are reduced to output (or deeper nodes) – Dodgie Apr 15 '17 at 00:32
  • 3
    This is a very interesting post to read in 2017. When Deep Learning is at a peak. SVMs do not outperform NNs currently in many tasks, specially when the dimensionality is high. – dev_nut Jun 28 '17 at 15:38
  • @dev_nut it is indeed & I agree with your further characterization; that being said, I think doug had his head screwed on straight w/r/t the delta in ease of parallelization; it remained for the reader to understand the benefit this would later provide. I also think calling out the "explicit determination of the decision boundaries" is a profound description of the difference between these two approaches; worthy of an upvote on its own. And lastly, I think his prediction performance section may still be valid, though I think the prediction resolution part is not giving the ANNs a fair shake – anthropic android Oct 14 '19 at 21:21
  • The ANNs are not being given a fair shake in that I would guess--before looking at the evidence, it must be said--the ANNs and SVMs in the linked analysis are both tasked with classifying data with substantial dimensional reduction, which would eliminate the ground where ANNs excel. – anthropic android Oct 14 '19 at 21:30