16

I wrote following code and test it on small data:

classif = OneVsRestClassifier(svm.SVC(kernel='rbf'))
classif.fit(X, y)

Where X, y (X - 30000x784 matrix, y - 30000x1) are numpy arrays. On small data algorithm works well and give me right results.

But I run my program about 10 hours ago... And it is still in process.

I want to know how long it will take, or it stuck in some way? (Laptop specs 4 GB Memory, Core i5-480M)

abunickabhi
  • 558
  • 2
  • 9
  • 31
Il'ya Zhenin
  • 1,272
  • 2
  • 20
  • 31
  • 1
    So uh...30000 dimensions and 30000 X 784 points....I have not really worked too long with machine learning but that is a pretty big and high dimensional feature vector...i do not think its too surprising its taking that long...you could try reducing the dimensions to speed it up... – Roy Aug 10 '13 at 20:37
  • 1
    @Roy Reducing the number of training instances would be *much* more effective than dimensionality reduction for kernel methods. – Marc Claesen Aug 11 '13 at 15:56
  • @MarcClaesen Would have to take your word on it, I'm not much more than a novice myself. – Roy Aug 12 '13 at 14:17

1 Answers1

40

SVM training can be arbitrary long, this depends on dozens of parameters:

  • C parameter - greater the missclassification penalty, slower the process
  • kernel - more complicated the kernel, slower the process (rbf is the most complex from the predefined ones)
  • data size/dimensionality - again, the same rule

in general, basic SMO algorithm is O(n^3), so in case of 30 000 datapoints it has to run number of operations proportional to the2 700 000 000 000which is realy huge number. What are your options?

  • change a kernel to the linear one, 784 features is quite a lot, rbf can be redundant
  • reduce features' dimensionality (PCA?)
  • lower the C parameter
  • train model on the subset of your data to find the good parameters and then train the whole one on some cluster/supercomputer
lejlot
  • 64,777
  • 8
  • 131
  • 164
  • 4
    Kernel computation time is usually a non-issue when truly large problems are being considered. The difference between RBF and, say, polynomial is irrelevant. The only aspect of kernel complexity is linear vs others. Additionally, training complexity ranges from `O(n^2)` (smalll `C`) to `O(n^3)` (large `C`). Third, input dimensionality doesn't matter much in overall complexity (which is in function of number of training instances, not dimensionality). – Marc Claesen Aug 11 '13 at 10:45
  • Thank you. That parameter C makes algorithm working slower I wasn't thinking about. And didn' know that rbf is most complicated kernel - but it is true, when I change kernel on 'poly' it gave result in 2 hours. – Il'ya Zhenin Aug 31 '13 at 09:02
  • 1
    @Marc - Thanks for the comments. There is a huge difference between RBF and polynomial - not because kernel function itself is complex, but that the induced RHKS is, and this is where the optimization takes place. Second, O(n^3) is the upper bound, obviously for small C it is faster. Third - dimensionality matters as it is inside a cost of each kernel computation (less important) and as it participates in complexity of the complexity of induced RHKS (more important) – lejlot Dec 25 '15 at 17:43