0

I am trying to create a set of correlation matrices by different levels of a factor variable.

This question has previously been answered (spearman correlation by group in R) but not for a matrix and the vector result doesn't seem to generalize as far as I can see.

The code below works, but can't be written to a csv as by() outputs a list - the error is "cannot coerce class ""by"" to a data.frame"

cor1<- by(data, INDICES=data$factor0, FUN = function(x) cor(x[,c("x","y","z","a",
    "b","c")],method="spearman",use="pairwise"))

So I am looking for a method to either coerce the above into a data.frame so I can write it to a csv, or to produce the above result by an alternative method which outputs a data frame

Any help greatly appreciated

Community
  • 1
  • 1
Impossible9
  • 101
  • 7

3 Answers3

0

The reason you get a list is because if x is a matrix than cor(x) will be a matrix as well, not a scalar. In this case it will be a 6x6 matrix. So the result is a list of 6x6 matrices, one for each factor level.

This is the natural way to represent the result, it seems to me. You can make it into a single data frame if you want, though I'm not sure what you want the rows and columns to represent exactly. Here is one option.

data<-matrix(rnorm(500),100,5)
colnames(data)<-letters[1:5]
factors<-sample(LETTERS[1:3],100,T)
cors<-by(data,factors,cor)
cors[[1]]
#             a           b           c           d           e
# a  1.00000000  0.05389618 -0.16944040  0.25747174  0.21660217
# b  0.05389618  1.00000000  0.22735796 -0.06002965 -0.30115444
# c -0.16944040  0.22735796  1.00000000 -0.06625523 -0.01120225
# d  0.25747174 -0.06002965 -0.06625523  1.00000000  0.10402791
# e  0.21660217 -0.30115444 -0.01120225  0.10402791  1.00000000

corsMatrix<-do.call(rbind,lapply(cors,function(x)x[upper.tri(x)]))
names<-outer(colnames(data),colnames(data),paste,sep="X")
colnames(corsMatrix)<-names[upper.tri(names)]
corsMatrix

#           aXb         aXc        bXc         aXd         bXd         cXd
# A  0.05389618 -0.16944040 0.22735796  0.25747174 -0.06002965 -0.06625523
# B -0.34231682 -0.14225269 0.20881053 -0.14237661  0.25970138  0.27254840
# C  0.27199944 -0.01333377 0.06402734  0.02583126 -0.03336077 -0.02207024
#           aXe        bXe         cXe         dXe
# A 0.216602173 -0.3011544 -0.01120225  0.10402791
# B 0.347006942 -0.2207421  0.33123175 -0.05290809
# C 0.007748369 -0.1257357  0.23048709  0.16037247

I'm not sure if this is what you are looking for. Another option is to export each correlation matrix to its own csv file.

mrip
  • 14,913
  • 4
  • 40
  • 58
  • This is a good point, it is best as a list of matrices. The issue is I want to get these matrices into Excel. Is it possible to coerce an element of the list into a data frame? I would then have (for 3 levels of the factor) three data frames and can write each one to csv – Impossible9 Dec 11 '14 at 15:39
  • 1
    You don't need to coerce them to data frames. You can write matrices directly to csv. As in `write.csv(cors[[1]],"temp.csv")`. Just loop over the output of `by`. – mrip Dec 11 '14 at 15:43
0

Your query is not that clear, at least to me. If I took it correctly, you may need to have a pairwise matrix first before computing correlation. You may want try the following function in SciencesPo.

require(SciencesPo)

m<-rprob(mtcars, df = nrow(mtcars) - 2)

The following will stack you matrix, so it becomes easier to check r and related p-values.

rstack(m)

daniel
  • 1,186
  • 2
  • 12
  • 21
0

You can use ddply from package library(plyr):

 library(plyr)
 n <- 1e2
 mdat <- data.frame(factor0 = factor(LETTERS[sample(26, n, TRUE)]), x = rnorm(n), 
                                     y = rnorm(n), z = rnorm(n), a = rnorm(n), b = rnorm(n),
                                     c = rnorm(n))
 ddply(mdat, .(factor0), function(d) {
      ret <- as.data.frame(cor(d[, letters[c(1:3, 24:26)]], method="spearman",use="pairwise"))
      ret$col <- letters[c(1:3, 24:26)]
      ret[, c(7, 1:6)]})
thothal
  • 16,690
  • 3
  • 36
  • 71