1

I have 82 .csv files, each of them a zoo object, with the following format:
"Index", "code", "pp"
1951-01-01, 2030, 22.9
1951-01-02, 2030, 0.5
1951-01-03, 2030, 0.0

I want to do a correlation matrix between the pp of all of my files. I found out how to do it "manually" between two files:
zz<-merge(x,y, all = FALSE)
z<-cbind(zz[,2], zz[,4])
cor(z,use= "complete.obs")

but I can't come up with a loop to do it for all the files... a few things to consider: each file starts and ends at different dates and I would like the matrix to show the codes so I can identify who is who.

Can anyone help?

sbg
  • 1,772
  • 8
  • 27
  • 45

1 Answers1

4

I think you have the bones of a perfectly good solution here, actually. If you start with list.files() to generate a list of your csv files:

fileList <- list.files(path="path/to/csv/files")

then read in all the files using lapply():

datList <- lapply(fileList,read.csv)

then merge the first two files (assuming the code is the same for each file):

dat <- merge(datList[[1]][,-2],datList[[2]][,-2],by="Index",
        suffixes=c(datList[[1]]$code[1],datList[[2]]$code[1]))

The suffixes argument will help you name the columns by code, for future reference. Then loop over the rest of datList using a simple for loop, merging each one with dat:

for (i in 3:length(datList)){
    dat <- merge(dat,datList[[i]][,-2],by="Index",suffixes=datList[[i]]$code[1])
}

and then you should be able to run cor on dat minus the first column. You might have to tweak this code a bit, but this general idea ought to work.

joran
  • 169,992
  • 32
  • 429
  • 468
  • thanks for your code, but it's not working and I can't figure out why. The data.frame dat looks like this: `Index pp3195 pp2030 pp2059 ppNA pp2113 ppNA pp2234.1 ppNA pp2290 ppNA pp2422 ppNA 18592 2010-12-26 0.0 NA NA 0 0 0.0 0.0 NA ` So it's not reading some of the suffixes and they don't seem to be in order either, so I can't figure out who is who. Any idea why? – sbg Jun 06 '11 at 15:13
  • Hard to say w/out a direct look at your full data. The ppNA's suggest that you have some files with NAs in the code column, and the pp2234.1 suggests that either you have a non-integer code, or duplicate codes (two files for code 2234). My code will put the columns in the order they appear in list.files; you could reorder them by coercing the column names to character and using `order()`. – joran Jun 06 '11 at 15:51
  • thanks I really had some NAs in the code that were causing problems – sbg Jun 08 '11 at 16:28