I am currently working on a very small dataset of about 25 samples (200 features) and I need to perform model selection and also have a reliable classification accuracy. I was planning to split the dataset in a training set (for a 4-fold CV) and a test set (for testing on unseen data). The main problem is that the resulting accuracy obtained from the test set is not reliable enough.
So, performing multiple time the cross-validation and testing could solve the problem?
I was planning to perform multiple times this process in order to have a better confidence on the classification accuracy. For instance: I would run one cross-validation plus testing and the output would be one "best" model plus the accuracy on the test set. The next run I would perform the same process, however, the "best" model may not be the same. By performing this process multiple times I eventually end up with one predominant model and the accuracy will be the average of the accuracies obtained on that model.
Since I never heard about a testing framework like this one, does anyone have any suggestion or critics on the algorithm proposed?
Thanks in advance.