SVM rank works only on tiny datasets

Question

I am using svm-rank.

When running svm_rank_learn on a tiny dataset:

Training set properties: 3 features, 12 rankings, 596 examples

The run finishes in a few seconds and I get a valid model. But when I use a bit larger dataset:

Training set properties: 3 features, 30 rankings, 1580 examples

The run is stuck for hours on iteration 29. This is very strange since the documentation states that svm-rank "scales linearly in the number of rankings (i.e. queries)".

What is wrong with my dataset or format?

I can reproduce this problem. But on my machine it stops on iteration 24. You could try to run the svm-light code with the parameters that are supposed to give the same output. See whether it gets stuck there. — Unapiedra, Oct 08 '14 at 14:24
It would be good to include the exact commands you ran, including any configuration parameters. — dmh, Aug 14 '15 at 09:47

Unapiedra · Answer 1 · 2014-10-08T14:20:43.603

However, since I did not want to spend more than an afternoon on coding SVMrank, I only implemented a simple separation oracle that is quadratic in the number of items in each ranking (not the O[k*log k] separation oracle described in [Joachims, 2006]). http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html

You are more or less increasing the number of examples by 3. So, you'd expect that the time increases by a factor of 9.

[S]ince the documentation states that svm-rank "scales linearly in the number of rankings (i.e. queries)"

You scale the number of rankings also by a factor of a bit more than 2. So, combine both of this, and you'd expect the training to take around 20 times longer.

This doesn't explain why it would go from a few seconds to multiple hours.

score 0 · Answer 2 · answered May 15 '16 at 13:31

Your feature values fall into different ranges. Try scaling your features across samples to have zero mean and unit variance for every feature. It also helps to normalize features within every single sample. These two steps speed up calculations immensely.

Scikit-learn has a nice introduction about data preprocessing and it also provides methods allowing to do this easily, find more on http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing.

SVM rank works only on tiny datasets

2 Answers2