I have this dataset:
dbppre dbppost per1pre per1post per2pre per2post
0.544331824055634 0.426482748529805 1.10388140870983 1.14622255457398 1.007302668 1.489675646
0.44544008292805 0.300746382647025 0.891104906479033 0.876840408251785 0.919450773 0.892276804
0.734783578764543 0.489971007532308 1.02796075709944 0.79655130374748 0.610340504 0.936092006
1.04113077142586 0.386513119551008 0.965359488375859 1.04314173155816 1.122001994 0.638452078
0.333368637355291 0.525460160226716 NA 0.633435747 1.196988457 0.396543005
1.76769244892893 0.726077921840058 1.08060419667991 0.974269083108835 1.245643507 1.292857474
1.41486783 NA 0.910710353033318 1.03435985624106 0.959985314 1.244732938
1.01932795229362 0.624195252685448 1.27809687379565 1.59656046306852 1.076534265 0.848544508
1.3919315726037 0.728230610741795 0.817900465495852 1.24505216554384 0.796182044 1.47318564
1.48912544220417 0.897585509143984 0.878534099910696 1.12148645028777 1.096723799 1.312244217
1.56801709691326 0.816474814896344 1.13655475536592 1.01299018097117 1.226607978 0.863016615
1.34144721808244 0.596169010679233 1.889775937 NA 1.094095173 1.515202105
1.17409999971024 0.626873517936125 0.912837009713984 0.814632450682884 0.898149331 0.887216585
1.06862027138743 0.427855128881696 0.727537839417515 1.15967069522768 0.98168375 1.407271061
1.50406121956726 0.507362673558659 1.780752715 0.658835953 2.008229626 1.231869338
1.44980944220763 0.620658801480513 0.885827192590202 0.651268425772394 1.067548223 0.994736445
1.27975202574336 0.877955236879164 0.595981804265367 0.56002696152466 0.770642278 0.519875921
0.675518080750329 0.38478948746306 0.822745530980815 0.796051785239611 1.16899539 1.16658889
0.839686262472682 0.481534573379965 0.632380676760052 0.656052506855686 0.796504954 1.035781891
.
.
.
As you can see, there are multiple cuantitative variables for gene expression data, each gene meassured two times, pre and post treatment, with some missing values in some of the variables.
Each row corresponds to one individual, and they are also divided in two groups (0 = control, 1 = really treated).
I would like to make a correlation (Spearman or Pearson depending on normality, but by group, and obtaining the correlation value and the p-value significance, avoiding the NAs.
Is it possible?
I know how to implement cor.test()
function to compare two variables, but I could not find any variable inside this function to take groups into account.
I also discovered plyr
and data.table
libraries to do so, by groups, but they return just the correlation value without p-value, and I haven't been able to make it word for variables with NAs.
Suggestions?