1

I try to train an LS-SVM classifier on a dataset having the following size:

Training dataset: TS = 48000x12 (double)
Groups: G = 48000x1 (double)

Matlab training code is:

class = svmtrain(TS,G,'method','LS',...
                 'kernel_function','rbf','boxconstraint',C,'rbf_sigma',sigma);

Then, I got this error message:

Error using svmtrain (line 516)
Error evaluating kernel function 'rbf_kernel'.

Caused by:
Error using repmat
Out of memory. Type HELP MEMORY for your options.

Note that the size of the physical memory is 4Gb, and it works when I decrease dataset training size. So if there are any solution with the same data size and of course without adding physical memory.

Sofiane
  • 67
  • 7
  • How many classes do you have in your training set? – lejlot Jan 06 '14 at 12:42
  • @lejlot: `svmtrain` Matlab function works only on binary classification, so I have just two classes. – Sofiane Jan 06 '14 at 13:06
  • You might be able to get away with using Breeze in Scala for SVM. I don't like unscalable solutions to anything, and Matlab is always going to fail to scale. I suggest you get into http://spark.incubator.apache.org/docs/latest/mllib-guide.html – samthebest Jan 06 '14 at 13:52
  • Not sure if it is possible, but perhaps using a smaller datatype is possible, I would try `Single` or `int8` for instance. And of course make sure you don't have unnecesary stuff in your memory. Also please confirm that you are running 64 bit matlab. See [here](http://www.mathworks.com/matlabcentral/answers/91711) why. – Dennis Jaheruddin Jan 06 '14 at 14:27
  • @DennisJaheruddin: Thanks for the comment. Yes, I'm running a 64 bit Matlab, and the problem is the same even when I try with a `single` data type isted of a `double` one. – Sofiane Jan 07 '14 at 15:56
  • possible duplicate of [Out of memory using svmtrain in Matlab](http://stackoverflow.com/questions/15994222/out-of-memory-using-svmtrain-in-matlab) – Dennis Jaheruddin Jan 07 '14 at 16:13
  • @DennisJaheruddin: No, because the problem should be fixed within LS-SVM and not SMO-SVM. – Sofiane Jan 07 '14 at 16:31
  • Not sure it it will turn up something, but try running the code with `dbstop if error`. At the time the error occurs try to determine how big the matrix is that matlab attempts to create (and of which type). Also look at the output of `memory` .Perhaps it is possible to remove some variables before this line and load them back in afterwards, but if the variable size itself is the problem the only thing I can think of is using a different variable type. -- Also consider doing it in batches, as suggested by @Amro in the linked question. – Dennis Jaheruddin Jan 07 '14 at 16:52
  • @DennisJaheruddin: How can I train an LS-SVM classifier in batches using Matlab ? – Sofiane Jan 07 '14 at 21:11

1 Answers1

1

It seems, that the implementation requires computation of the whole Gram matrix, which is the size of N x N (where N - number of sampels) in your case it is 2,304,000,000, now each is represented by the 32bit float, meaning it requires at least 4 bytes which gives as 9,216,000,000 bytes required, which is roughly 9GB of data just for a Gram (Kernel) matrix.

There are two options:

  • Find implementation which for RBF kernel do not compute the kernel (Gram) matrix, but instead use some callable to compute the kernel value each time
  • You can try to use some kind of LS-SVM approximation, like Fast Sparse Approximation of Least Squares Support Vector Machine : http://homes.cs.washington.edu/~lfb/software/FSALS-SVM.htm
lejlot
  • 64,777
  • 8
  • 131
  • 164
  • Thanks for responding. But how did you computed the number N? (knowing that I have 48000 samples for training and 12 features) – Sofiane Jan 06 '14 at 13:11
  • N = 48000, so N*N = 2,304,000,000 – lejlot Jan 06 '14 at 13:57
  • Second option of your answer is useful, but it not fix the problem, because when I try to integrate the developed C functions (after `mex` compilation), Matlab encounter an internal problem and needs to close. In addition, I prefer a solution that keep the use of Matlab functions (in particular Matlab SVM functions for training and test). Thanks for the useful proposal. – Sofiane Jan 07 '14 at 16:06