0

I would like to use the cbind in a list of files. However each file are splited in a specific chromosome (chr) (k in 1:29), and specific sample (i in 1:777). The files are like:

sample1chr1.txt, sample1chr2.txt ... sample1chr29.txt, sample2chr1.txt ... sample777chr29.txt

All files have exactly the same rows names (3 first columns represent my row names). I would like to get a final file to each chr merging to all sample files, with and do not repeat the row names in the final file (the first 3 columns representing my rows names).

I tried this:

#Creating file with row names (3 first columns) to each Chr 
{
{for(k in 1:29){
  infile <- paste0("sample1chr",k,".txt")
  outfile <- paste0("LRRrawallchr",k,".txt")
  rows <- read.table(infile, header=TRUE, sep="\t")
  rows <- rows[, -grep("Log.R.Ratio", colnames(rows))]
  write.table(rows, outfile, sep=";")}}

#Cbind in one file per Chr
{  for(i in 1:777)
  for(k in 1:29){
    base <- paste0("LRRrawallchr",k,".txt")
    chr <- read.table(base, header=TRUE, sep=";")
    infile <- paste0("sample",i,"chr",k,".txt")
    chr2 <- read.table(infile, header=TRUE, sep="\t")
    outfile <- paste0("LRRrawallchr",k,".txt")
    chr2 <- chr2[, -grep("Name", colnames(chr2))]
    chr2 <- chr2[, -grep("Chr", colnames(chr2))]
    chr2 <- chr2[, -grep("Position", colnames(chr2))]
    chr <- cbind(chr, chr2)
    write.table(chr, outfile, sep=";", row.names=FALSE, col.names=FALSE)}
}

Input example (sample1chr1.txt):

 Name      Chr  Position    sample1value
BAC-11034   1   128            0.302
BAC-11044   1   129            -0.56
BAC-11057   1   134            0.0840

Input example (sample2chr1.txt):

Name       Chr  Position      sample2value
BAC-11034   1   128            0.25
BAC-11044   1   129            0.41
BAC-11057   1   134           -0.14

Expected output (LRRrawallchr1):

Name       Chr  Position    sample1value   sample2value
BAC-11034   1   128         0.302          0.25
BAC-11044   1   129         -0.56          0.41
BAC-11057   1   134         0.0840         -0.14

I have 22553 different .txt files (29 files (one per chr) to each of 777 samples). All 22553 files (sample1chr1.txt, sample1chr2.txt ... sample1chr29.txt, sample2chr1.txt ... sample777chr29.txt) are like above example.

I wanna 29 files like (LRRrawallchr1), one per Chr. The "LRRrawallchr,k," files have to be with 777+3 (800 collumns). The 3 row names and one column per sample.

Cheers!

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
user3091668
  • 2,230
  • 6
  • 25
  • 42

3 Answers3

1

Try:

a <- NULL
for(k in 1:29)
{
a <- read.table(paste0("sample1chr", k, ".txt"), header=TRUE, sep="\t")
for(i in 2:777)
{
infile <- read.table(paste0("sample", i, "chr", k, ".txt"), header=TRUE, sep="\t")

a <- merge(a, infile, by = c('Name', 'Chr', 'Position'))
}
write.table(a, paste0("LRRrawallchr", k, ".csv"), append = FALSE, sep = ",", row.names = F)
}
FFI
  • 392
  • 1
  • 10
0

You want to merge the sets, not cbind them. merge will combine rows based on common or specified column names. After reading the first two into data frames, this command produces the merge. I am giving the common column names to merge (with by) as you are filtering by these names in your code.

> merge(sample1chr1, sample1chr2, by=c('Name', 'Chr', 'Position'))
       Name Chr Position sample1value sample2value
1 BAC-11034   1      128        0.302         0.25
2 BAC-11044   1      129       -0.560         0.41
3 BAC-11057   1      134        0.084        -0.14

Then continue to merge in following sets.

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • I can´t merge. All data is too big. I am trying some way to bind one sample per time. Some way to cbind the sample1 , and write, cbind sample2 and write, cbind sample3 and write.... Reaching in the end all samples to each Chr separately... – user3091668 Apr 19 '14 at 14:58
0

If the order of the rows is always identical for all files and only the last column's value change, then you can cbind() only the last column of each file, where (starting from k=2):

infile <- cbind(infile, chr[, 4])

Where infile is the file where your data collect, and chr is your newly-loaded file inside the loop. If your rows are not ranked in the same order, see @Matthew's solution.

PS: This will result in a file with more than 22 thousand columns. That's not a good format for most procedures in R

ilir
  • 3,236
  • 15
  • 23
  • My problem is that my final result have just one "sample,i,value". I am wondering if it is possible to write my output and read again 777 times (one to each sample). Resulting in a file with all collumns (3+ 777) in the end. Your procedure just salve me the `grep` steps. I am wrong? – user3091668 Apr 19 '14 at 15:17
  • No you are right, that's why I put the PS at the end. Then I suggest you first put them in a long dataset (using `rbind`) and then look into `dcast()` from the package `reshape2`. – ilir Apr 19 '14 at 15:20