How to combine hundreds of columns into one data frame by matching to a specific column in R?

Question

I currently have hundreds of files containing unique IDs and unnormalized read counts. I want to take the read counts from each file and match them all to the unique IDs in the first column. However, each files has a different amount of counts and different IDs that may or may not contain duplicates from the last file. (Basically I'm looking to make a counts file for DESeq2)

I was using the code below to combine these files but the counts don't match up with the original IDs.

My overall goal is to just take the unnormalized read counts from every file and match them a dataframe with the total list of corresponding unique IDs -- if the file does not have counts for that particular ID then it could just be filled with 0.

'''

DF = do.call(cbindX,
             lapply( list.files(pattern=".*.txt"),
                     FUN=function(x) { 
                       aColumn = read.delim(x,header=T)[,c("MINTbase.Unique.ID", "Unnormalized.read.counts")];
                       colnames(aColumn)[2] = x;
                       aColumn;
                     }
             )
)
DF = DF[,!duplicated(colnames(DF))]

'''

Hi @Isabella2526, welcome to SO! I presume that what you're looking for is a join of all tables, and not cbind. Consider reading each file into a data.frame, and then join the data.frames with the Unique ID as key. — Omri374, Jul 29 '21 at 19:14

How to combine hundreds of columns into one data frame by matching to a specific column in R?

0 Answers0