1

I have a set of 270 RNA-seq samples, and I have already subsetted out their expected counts using the following code:

for (i in 1:length(sample_ID_list)) {
  assign(sample_ID_list[i], subset(get(sample_file_list[i]), select = expected_count)) 
  }

Where sample_ID_list is a character list of each sample ID (e.g., 313312664) and sample_file_list is a character list of the file names for each sample already in my environment to be subsetted (e.g., s313312664).

Now, the head of one of those subsetted samples looks like this:

> head(`308087571`)
# A tibble: 6 x 1
  expected_count
           <dbl>
1           129 
2             8 
3           137 
4          6230.
5          1165.
6             0 

The problem is I want to paste all of these lists together to make a counts dataframe, but I will not be able to differentiate between columns without their sample ID as the column name instead of expected_count.

Does anyone know of a good way to go about this? Please let me know if you need any more details!

aynber
  • 22,380
  • 8
  • 50
  • 63
avery
  • 43
  • 4

3 Answers3

0

If we want to name the list, loop over the list, extract the first element of 'expected_count' ('nm1') and use that to assign the names of the list

nm1 <- sapply(sample_file_list, function(x) x$expected_count[1])
names(sample_file_list) <- nm1

Or from sample_ID_list

do.call(rbind, Map(cbind, mget(sample_ID_list), name = sample_ID_list))

Update

Based on the comments, we can loop over the 'sample_file_list and 'sample_ID_list' with Map and rename the 'expected_count' column with the corresponding value from 'sample_ID_list'

sample_file_list2 <- Map(function(dat, nm) {
          names(dat)[match('expected_count', names(dat))] <- nm
          dat
        }, sample_file_list, sample_ID_list)

Or if we need a package solution,

library(data.table)
rbindlist(mget(sample_ID_list), idcol = name)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Sorry if I wasn't clear enough, I want to _replace_ the header 'expected_count' for each sample with its sample ID. The sample ID is currently correctly named for each individual sample, but I need to also replace its column header 'expected_count' with the ID so I can splice the lists together. – avery May 12 '21 at 18:24
  • @avery so you want to change the column name in each of the `file_list` with the corresponding value from sample_ID_list – akrun May 12 '21 at 18:25
  • @avery do you want the output as in the updated – akrun May 12 '21 at 18:36
0

You can use:

dplyr::bind_rows(mget(sample_ID_list), .id = name)
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

Update: Thank you all so much for your help. I had to update my for loop as follows:

for (i in 1:length(sample_ID_list)) {
  assign(sample_ID_list[i], subset(get(sample_file_list[i]), select = expected_count)) 
  data<- get(sample_ID_list[i])
  colnames(data)<- sample_ID_list[i]
  assign(sample_ID_list[i],data)
  }

and was able to successfully reassign the names!

avery
  • 43
  • 4