0

I want to perform a Cross Validation to select the best parameters Gamma and C for the RBF Kernel of the SVR (Support Vector Regression). I'm using LIBSVM. I have a database that contains 4 groups of 3D meshes. My question is: is this approach I am using is ok for 4-fold Cross Validation? I think, for selecting the parameters C and Gamma of the RBF Kernal, I must minimize the error between the predicted values and the groud_truth_values.

I have also another problem, I get this a NAN value while the Cross-Validation (Squared correlation coefficient = nan (regression))

Here is the code i wrote:

[C,gamma] = meshgrid(-5:2:15, -15:2:3); %range of values for C and 
                                        %gamma

%# grid search, and cross-validation

for m=1:numel(C)

    for k=1:4 
        fid1 = fopen(sprintf('list_learning_%d.txt',k), 'rt'); 
        i=1;

        while feof(fid1) == 0 
            tline = fgetl(fid1); 
            v= load(tline);
            v=normalize(v);
            matrix_feature_tmp(i,:)=v;
            i=i+1;
        end  

        fclose(fid1);

        % I fill matrix_feature_train of size m by n via matrix_feature_tmp

        %%construction of the test matrix
       fid2 = fopen(sprintf('liste_features_test%d.txt',k), 'rt'); 
       i=1;

       while feof(fid2) == 0 
           tline = fgetl(fid2); 
           v= load(tline);
           v=normalize(v);
           matrice_feature_test_tmp(i,:)=v;
           i=i+1;
       end  

       fclose(fid2);

       %I fill matrix_feature_test of size m by k via matrix_feature_test_tmp

       mos_learning=load(sprintf('mos_learning_%d.txt',k));
       mos_wanted=load(sprintf('mos_test%d.txt',k));

       model = svmtrain(mos_learning, matrix_feature_train',sprintf('- 
       s %f -t %f -c %f -g %f -p %f  ',3,2 ,2^C(m),2^gamma(m),1 ));

       [y_hat, Acc, projection] = svmpredict(mos_wanted,
                                  matrix_feature_test', model);
       MSE_Test = mean((y_hat-mos_wanted).^2);
       vecc_error(k)=MSE_Test;

       end
       mean_vec_error_fold(m)=mean(vecc_error);
end

%select the best gamma and C 
[~,idx]=min(mean_vec_error_fold);

best_C = 2^C(idx);
best_gamma = 2^gamma(idx);

%training with best parameters
%for example
 model = svmtrain(mos_learning1, matrice_feature_train1',sprintf('-s   
          %f -t %f -c %f -g %f -p %f  ',3,2 ,best_C, best_gamma,1 ));

[y_hat_final, Acc, projection] = svmpredict(mos_test1,matrice_feature_test1', 
                           model);
Anass
  • 251
  • 2
  • 15

1 Answers1

1

Based on your description, without reading your code, it sounds like you are NOT doing cross-validation. Cross-validation requires you to pick a parameter set (i.e. a value for C and gamma) and holding those parameters constant use k-1 folds to train, 1 fold to test and to do this k times such that you use each fold as the test set once. Then aggregate the error / accuracy measure for these k tests and that is the measure you use to rank those parameters for a model trained on ALL the data. Call this your cross-validation error for the parameter set you used. You then repeat this process for a range of different parameters and choose the parameter set with the best accuracy / lowest CV error. Your final model is trained on all your data.

Your code doesn't really make sense to me. Looking at this snippet

folds = 4; 
for i=1:numel(C)
    cv_acc(i) = svmtrain(ground_truth, matrice_feature_train', ...
                sprintf(' -s %d -t %d -c %f -g %f -p %d -v %d',3,2, 
                2^C(i), 2^gamma(i), 1, 4)); %Kernel RBF
end

What is it that cv_acc contains? To me it contains the actual SVM model (an SVMStruct if you use the MATLAB toolbox, something else if you used LIBSVM). This would be OK IF you were using your loop to change which folds are used as the training set. However you have used them to change the value of your gamma and C parameters, which is incorrect. However you later call min(cv_acc); so I'm now guessing that you think the call to smvtrain actually returned the training error? I don't see how you can meaningfully call min on an array of structures like that, but I could be wrong. But even so, you aren't actually interested in minimising your training error, you want to minimise your cross-validation error which is the aggregate of the test error from your k runs and has nothing to do with your training error.

Now it's impossible to actually know if you've done this bt wrong since you don't show us the vectors of gamma and C but it's strange to only have 1 loop rather than a nested loop to iterate through these (unless you have arranged them like a truth-table but I doubt that). You need to test each potential value of C paired with each value of gamma. Currently it looks like you're only trying 1 different value of gamma for each value in C.

Have a look at this answer to see an example of cross-validation used with SVM.

Community
  • 1
  • 1
Dan
  • 45,079
  • 17
  • 88
  • 157
  • Hi, 1) I have specified in my post that the call of SVMTRAIN returns the Mean Square Error and not the STRUCTURE MODEL (notice that i call the SVTMTRAIN with option -V). So, I think the use of min(cv_acc) is correct because the vector cv_acc contains the errors associated to each pairs (C, gamma). 2) I have to use the nested loop for changing the folds. i will edit. 3) I guessed that i have to minimize the error of the cross validation to later select the best parameters ? – Anass Jun 01 '16 at 15:04
  • @Anass if it's the training error it is still wrong, you don't care which parameter set gives you the best training error, you care about the cross validation error. So you train on 3 folds, return the SVM model, use that model to predict on the 4th fold, calculate the MSE, do that 4 times, aggregate the errors and that aggregate is your *cross-validation* error for that parameter set. You then repeat it for all the parameter combinations you want to test and only *then* do you care about the minimum of the cross-validation error for each parameter combination. – Dan Jun 01 '16 at 15:08
  • 1
    ok Thank's. I will try this and edit my post to show the results – Anass Jun 01 '16 at 15:12