Using SVM for gene expression analysis

Question

My problem:

a) I got a dataset for the expression of 1000 genes at 6 time points.

b) Some genes (testing set) belong to a certain class characterized by the distribution of gene expression over theses time points.

c) I also have a data set of known genes for this class (training set).

d) Additionally I would like to generate a false dataset by randomly reorganizing my testing set and also include that one in my SVM model.

I think I know how to do (a)-(c) by using R and the e1071 package, but I am not sure how to implement (d). Should I just test my false data with the calculated model and compare afterwards the results on this dataset and the test set?

And what distributions should I use for comparison? (paretro or maybe universal gamma supplying my calculated probabilities?)

In the end it would be perfect to get something like a score comparing false data and testdata!! ; ) — hendrik, May 23 '13 at 22:04

score 0 · Accepted Answer · edited May 23 '17 at 12:25

I would consider two approaches:

As you are suggesting, run your false set (or rather, multiple permutations, i.e. multiple false sets) as your additional test sets in the SVM and compare the scores with the real test set. Essentially, you would want to show that your real test set performs significantly better than most of your false sets. This would be in the spirit of a statistical test described, for example, in this paper for more complex data. Also, this paper may be useful for converting SVM scores into calibrated probabilities using a binning approach.
Build a two-class SVM using a subset of the false set as the second training set. The classification task will then be to ascertain to which class your gene expression pattern is more likely to belong: the "positive" class or the "false" one. This paper, this thread and this thread, as well as general SVM textbooks, may be helpful in deciding on how best to design this two-class classifier.

Hope it helps.

thank you, i will try to get the svm running and may open another thread. I'll check the paper and threads. — hendrik, May 24 '13 at 19:46

Using SVM for gene expression analysis

1 Answers1