Feature Extraction and Cross-Validation of an image dataset

Question

I have a dataset consisting of fMRI images. Each image belongs to one class. The dataset is as follows:

Class 1: 9 images 
Class 2: 10 images 
Class 3: 6 images 
Class 4: 12 images

Each image is 4D (time series), i.e. 90x60x10x350 where 350 is the time dimension (i.e. 350 3D volumes). I want to train a classifier on this data.

Now I want to first extract features and then apply feature selection by applying e.g. PCA and then do clustering, like described in the paper "Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data" (http://www.hindawi.com/journals/cmmm/2013/645921/). For feature extraction I see the following possibilities:

Each voxel is a feature and the average of each voxels time series is taken. Each image has exactly one feature vector of dimension 90*60*10 = 54'000
Each voxel is a feature and each time point (i.e. each 3D volume) is a data point. Each image has 350 feature vectors of dimension 90*60*10 = 54'000 each.
Putting all voxels of the whole time series of an image into one feature vector of size 90*60*10*350 = 18'900'000. Each image has only one feature vector.
Take the the correlation value between the voxels as feature values. But this is computationally not doable.

I'm preferring 2. but I'm not sure if this is a good idea.

How would you do the feature extraction? And how would a correlation based approach in a computational feasible way work?

Last but not least, how would you do cross-validation on the dataset? The problem is that the different classes are imbalanced.

Thank you so much for the answers beforehand.

Spearman, Chi-square and most correlation computations are `O(n)`. If you can work with problem instances of size 19 million (I'm assuming you can since you suggested 2 and 3), what exactly is computationally not doable about a Chi-square feature selection for example? As for how to do cross validation, one general method is to use a stratified k-fold CV, which preserves the class percentages. But you have so few images and they are so complex that I'm not sure how well anything will generalize from it. — IVlad, May 23 '15 at 21:57
@IVlad Thank you for your answer. Regarding correlation, I meant to calculate the correlation value between all pairs of voxels and take each value as a feature. Would this be a good way and is it not computational very expensive? Do you think 2) and 3) are also good ways to go? Or what other feature extraction methods (not feature selection) would you propose? Regarding CV, if I will do stratified 10-fold CV will this work when class 3 only has 6 images? Then some folds will have no images of class 3... — machinery, May 24 '15 at 00:01
I'm concerned that you have very few labeled examples for a very high-dimensional dataset, so it's not clear how well the classifier will generalize. The other thing is that, in order to give you advice on feature extraction, we would first need to understand what the data represents (feature extraction is not a black-box process). So you should elaborate on the experiments that generated that data. — cfh, May 24 '15 at 08:55
@cfh The data represent fMRI images. A fMRI image is a brain scan which shows the activiation in brains. Each fMRI image is 4D (a time series), so each voxel represents an activation (i.e. has some value). In fMRI image analysis it is common that the number of data is very low while the images are very high dimensional. In the end I want to train a classifier (e.g. SVM) which predicts the class label for an image. — machinery, May 24 '15 at 09:20
I guessed as much, but how much variation is there in time? Are these activations from staring at a static image, or is there a lot of movement going on? Can images from two different classes potentially have single time slices which are very similar, or are the activations at any fixed time slice already enough to discriminate the classes? — cfh, May 24 '15 at 09:27
@cfh There is some variation in time but continuously, i.e. there are no jumps. The images are from mices which were applied four different drug doses (thus the 4 different classes). I assume that two time slices from different classes can be similar, so of course the whole time course is more discriminative between the classes. How would you extract the features when it is better to take the whole time course into account? — machinery, May 24 '15 at 11:31

Feature Extraction and Cross-Validation of an image dataset

0 Answers0