2

I have this dataset:

    dbppre               dbppost              per1pre           per1post          per2pre       per2post
0.544331824055634   0.426482748529805   1.10388140870983    1.14622255457398    1.007302668 1.489675646
0.44544008292805    0.300746382647025   0.891104906479033   0.876840408251785   0.919450773 0.892276804
0.734783578764543   0.489971007532308   1.02796075709944    0.79655130374748    0.610340504 0.936092006
1.04113077142586    0.386513119551008   0.965359488375859   1.04314173155816    1.122001994 0.638452078
0.333368637355291   0.525460160226716           NA          0.633435747         1.196988457 0.396543005
1.76769244892893    0.726077921840058   1.08060419667991    0.974269083108835   1.245643507 1.292857474
1.41486783                  NA          0.910710353033318   1.03435985624106    0.959985314 1.244732938
1.01932795229362    0.624195252685448   1.27809687379565    1.59656046306852    1.076534265 0.848544508
1.3919315726037     0.728230610741795   0.817900465495852   1.24505216554384    0.796182044 1.47318564
1.48912544220417    0.897585509143984   0.878534099910696   1.12148645028777    1.096723799 1.312244217
1.56801709691326    0.816474814896344   1.13655475536592    1.01299018097117    1.226607978 0.863016615
1.34144721808244    0.596169010679233   1.889775937                 NA          1.094095173 1.515202105
1.17409999971024    0.626873517936125   0.912837009713984   0.814632450682884   0.898149331 0.887216585
1.06862027138743    0.427855128881696   0.727537839417515   1.15967069522768    0.98168375  1.407271061
1.50406121956726    0.507362673558659   1.780752715         0.658835953         2.008229626 1.231869338
1.44980944220763    0.620658801480513   0.885827192590202   0.651268425772394   1.067548223 0.994736445
1.27975202574336    0.877955236879164   0.595981804265367   0.56002696152466    0.770642278 0.519875921
0.675518080750329   0.38478948746306    0.822745530980815   0.796051785239611   1.16899539  1.16658889
0.839686262472682   0.481534573379965   0.632380676760052   0.656052506855686   0.796504954 1.035781891
.
.
.

As you can see, there are multiple cuantitative variables for gene expression data, each gene meassured two times, pre and post treatment, with some missing values in some of the variables.

Each row corresponds to one individual, and they are also divided in two groups (0 = control, 1 = really treated).

I would like to make a correlation (Spearman or Pearson depending on normality, but by group, and obtaining the correlation value and the p-value significance, avoiding the NAs.

Is it possible?

I know how to implement cor.test() function to compare two variables, but I could not find any variable inside this function to take groups into account.

I also discovered plyr and data.table libraries to do so, by groups, but they return just the correlation value without p-value, and I haven't been able to make it word for variables with NAs.

Suggestions?

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Please provide reproducible examples along with expected output and the code you have tried from the packages you mentioned – Sotos Jan 17 '17 at 11:28

1 Answers1

0

You could use the Hmisc package.

library(Hmisc)
set.seed(10)
dt<-matrix(rnorm(100),5,5) #create matrix

dt[1,1]<-NA #introduce NAs
dt[2,4]<-NA #introduce NAs

cors<-rcorr(dt, type="spearman") #spearman correlation
corp<-rcorr(dt, type="pearson") #pearson correlation

> corspear
     [,1] [,2] [,3] [,4] [,5]
[1,]  1.0  0.4  0.2  0.5 -0.4
[2,]  0.4  1.0  0.1 -0.4  0.8
[3,]  0.2  0.1  1.0  0.4  0.1
[4,]  0.5 -0.4  0.4  1.0 -0.8
[5,] -0.4  0.8  0.1 -0.8  1.0

n
     [,1] [,2] [,3] [,4] [,5]
[1,]    4    4    4    3    4
[2,]    4    5    5    4    5
[3,]    4    5    5    4    5
[4,]    3    4    4    4    4
[5,]    4    5    5    4    5

P
     [,1]   [,2]   [,3]   [,4]   [,5]  
[1,]        0.6000 0.8000 0.6667 0.6000
[2,] 0.6000        0.8729 0.6000 0.1041
[3,] 0.8000 0.8729        0.6000 0.8729
[4,] 0.6667 0.6000 0.6000        0.2000
[5,] 0.6000 0.1041 0.8729 0.2000       

For further details see the help section: ?rcorr

rcorr returns a list with elements r, the matrix of correlations, n the matrix of number of observations used in analyzing each pair of variables, and P, the asymptotic P-values. Pairs with fewer than 2 non-missing values have the r values set to NA.

nadizan
  • 1,323
  • 10
  • 23