My problem:
a) I got a dataset for the expression of 1000 genes at 6 time points.
b) Some genes (testing set
) belong to a certain class characterized by the distribution of gene expression over theses time points.
c) I also have a data set of known genes for this class (training set
).
d) Additionally I would like to generate a false
dataset by randomly reorganizing my testing set and also include that one in my SVM model.
I think I know how to do (a)-(c)
by using R
and the e1071
package, but I am not sure how to implement (d)
. Should I just test my false
data with the calculated model and compare afterwards the results on this dataset and the test set
?
And what distributions should I use for comparison? (paretro
or maybe universal gamma
supplying my calculated probabilities?)