1

I'm working in healthcare and I need help on how to use R. I explain: I have a set of data like that:

S1      S2      S3      S4      S5
0.498   1.48    1.43    0.536   0.548
2.03    1.7     3.74    2.13    2.02
0.272   0.242   0.989   0.534   0.787
0.986   2.03    2.53    1.65    2.31
0.307   0.934   0.633   0.36    0.281
0.78    0.76    0.706   0.81    1.11
0.829   2.03    0.667   1.48    1.42
0.497   1.27    0.952   1.23    1.73
0.553   0.286   0.513   0.422   0.573

Here are my objectives:

Do correlation between every column
Calculate p-values
Calculate R-squared
Only show when R2>0.5 and p-values <0.05 

Here is my code so far (it's not the most efficient but it work):

> e<-read.table(‘Workbook8nm.csv’, header=TRUE, sep=“,”, dec=“.”, na.strings=“NA”)
> f<-data.frame(e)
> M<-cor(f, use=“complete”) #Do the correlation like I want
> library(‘psych’)
> N<-corr.test (f) #Give me p-values

So, so far I have my correlation in M and my p-values in N. I need help on how to show R2 ?

And second part how to make R only show me when R2>0.5 and p-values<0.05 for example ? I used this line :

P<-M[which(m>0.9))] 

To show me only when the pearson coefficent is more than 0.9 as a training. But it just make me a list of every values that are superior to 0.9 ... So I don't know between which and which column this coefficient come from. The best would be that it show me significant values in a table with the name of column so after I can easily identify them. The reason I want to do that is because by table is 570 by 570 so I can't look at every p-values to keep only the significant one.

I hope I was clear ! It's my first post here, tell me if I did any mistake !

Thanks for your help !

3273
  • 13
  • 1
  • 4

1 Answers1

0

I'm sure there is a function somewhere in the R space to do this quicker, but I wrote a quick function to expand a matrix into a data.frame with the "row" and "column" as columns, and the value as a third column.

matrixToFrame <- function(m, name) {
    e <- expand.grid(row=rownames(m), col=colnames(m))
    e[name] <- as.vector(m)
    e
}

We can transform the correlation matrix into a data frame like so:

> matrixToFrame(cor(f), "cor")
   row col       cor
1   S1  S1 1.0000000
2   S2  S1 0.5322052
3   S3  S1 0.8573687
4   S4  S1 0.8542438
5   S5  S1 0.6820144
6   S1  S2 0.5322052
....

And we can merge the result of corr.test and cor because the columns match up

> b <- merge(matrixToFrame(corr.test(a)$p, "p"), matrixToFrame(cor(a), "cor"))
> head(b)
   row col            p       cor
1   S1  S1 0.0000000000 1.0000000
2   S1  S2 0.2743683745 0.5322052
3   S1  S3 0.0281656707 0.8573687
4   S1  S4 0.0281656707 0.8542438
5   S1  S5 0.2134783039 0.6820144
6   S2  S1 0.1402243214 0.5322052

Then we can just filter for the elements that we want

> b[b$cor > .5 & b$p > .2,]
   row col         p       cor
2   S1  S2 0.2743684 0.5322052
5   S1  S5 0.2134783 0.6820144
8   S2  S3 0.2743684 0.5356585
10  S2  S5 0.2134783 0.6724486
15  S3  S5 0.2134783 0.6827349

EDIT: I found R matrix to rownames colnames values, which provides a couple of attempts at matrixToFrame; nothing particularly more elegant than what I have here, though.

EDIT2: Make sure to read the docs carefully for corr.test -- it looks like different information gets encoded in the upper and lower diagonal (?), so the results here may be deceptive. You may want to do some filtering with lower.tri or upper.tri before the final filtering step.

Community
  • 1
  • 1
user295691
  • 7,108
  • 1
  • 26
  • 35
  • That's amazing ! That's exactly what I was looking for ! It's been two weeks know that I struggle on that. Its very elegant and simple presented like that. Many thanks for that !! Do you know any function in R that can permit me to also calculate Rsquared ? – 3273 Aug 13 '15 at 16:35
  • As I understand it, `cor(f)^2` will give the r-squared values. You could do `b$r2 <- b$cor^2` to add a column explicitly to the data frame. – user295691 Aug 13 '15 at 16:45
  • I'm completely stupid, I forget that R2 is simply the pearson coefficent squared ... Thanks a lot ! It work very fine ! I'm checked for corr.test to be sure it gives the right values. Huge thank for your help ! – 3273 Aug 13 '15 at 20:03