1

I have a list of 11 data frames, the name of each data frame describes its source. Essentially I want to add a "source" column to each data frame in the list which contains the name of the data frame in each cell.

This is all so the data can be passed downstream to a CRAN package which doesn't play well with lists.

I've tried using lapply and looked through some other SO answers but nothing seems to fit.

Any help is greatly appreciated,

  • Thanks
## Some toy data 

p1 <- c("A", "B", "C", "D", "E")  
p2 <- c(rnorm(5, 1.25, 1))  
p3 <- c(rnorm(5, 1.25, 1))
source_name_1 <- data.frame(p1, p2, p3)  

p2 <- c(rnorm(5, 1.25, 1))  
p3 <- c(rnorm(5, 1.25, 1))  
source_name_2 <- data.frame(p1, p2, p3) 
 
p2 <- c(rnorm(5, 1.25, 1))  
p3 <- c(rnorm(5, 1.25, 1))  
source_name_3 <- data.frame(p1, p2, p3)  

df_list <- list(source_name_1,
                source_name_2,
                source_name_3)

names(df_list) = paste0("source_name_", 1:length(df_list))

## Previous attempt based on other SO answers
df_list_2 <- lapply(names(df_list),
                 function(x) cbind(df_list),
                 source = names(df_list),
                 SIMPLIFY = TRUE)

#essentially I'm aiming for a 'p4' column in each df comprised of `^source_name[1-9]`
  • 1
    your list doesn't have names, so `names(df_list)` would return `NULL`, how are getting `source_name`? Is it something that have to define manually for each dataframe? – monte Sep 23 '20 at 14:54
  • Ah thanks for picking that up, I'll edit the question. Yes, I'm amending the source names on import to R, the original names are just "sheet1", "sheet2" etc. – DrBalticYaldie Sep 23 '20 at 15:59
  • No worries did my solution work for you? – Chuck P Sep 23 '20 at 16:40
  • 1
    Yes, thank you for your help I have it implemented now. I was wondering, can I use a similar approach to change the column names? – DrBalticYaldie Sep 24 '20 at 12:14
  • 1
    Yes, do you want to rename them all once you have the list built? You can just add to the last command ` %>% purrr::map(~ rename(.x, new_name = p1, another_name = p2))` for example – Chuck P Sep 24 '20 at 15:01
  • That would be good, my current approach uses `lapply(names(df_list),function(x) setNames(df_list[[x]], c("c1", "c2", "c3")))` but it removes the df names which is a pain. – DrBalticYaldie Sep 25 '20 at 12:45

1 Answers1

1

As noted by @monte in the comments you have to name the list elements. Assuming they all follow a pattern with "source_name_" you could do this using dplyr and purrr using your toy data

df_list <- list(source_name_1,
            source_name_2,
            source_name_3)

names(df_list) = paste0("source_name_", 1:length(df_list))

library(dplyr)
library(purrr)

purrr::map2(df_list, names(df_list), ~ mutate(.x, p4 = .y))
#> $source_name_1
#>   p1        p2        p3            p4
#> 1  A 0.1531752 1.5198717 source_name_1
#> 2  B 0.8299500 1.4534902 source_name_1
#> 3  C 2.1038329 0.3968661 source_name_1
#> 4  D 2.3939380 1.0487960 source_name_1
#> 5  E 1.5773872 1.8611408 source_name_1
#> 
#> $source_name_2
#>   p1         p2        p3            p4
#> 1  A  0.8662918 -1.014854 source_name_2
#> 2  B -1.8042179  1.339152 source_name_2
#> 3  C  1.4786439 -1.940525 source_name_2
#> 4  D  1.8360023  1.439776 source_name_2
#> 5  E  0.9648816  2.051714 source_name_2
#> 
#> $source_name_3
#>   p1       p2        p3            p4
#> 1  A 1.268633 1.7334884 source_name_3
#> 2  B 1.615704 1.0503553 source_name_3
#> 3  C 2.056368 1.4954794 source_name_3
#> 4  D 2.335987 1.6293595 source_name_3
#> 5  E 1.236283 0.4498371 source_name_3

Toy data

## Some toy data 

p1 <- c("A", "B", "C", "D", "E")  
p2 <- c(rnorm(5, 1.25, 1))  
p3 <- c(rnorm(5, 1.25, 1))
source_name_1 <- data.frame(p1, p2, p3)  

p2 <- c(rnorm(5, 1.25, 1))  
p3 <- c(rnorm(5, 1.25, 1))  
source_name_2 <- data.frame(p1, p2, p3) 

p2 <- c(rnorm(5, 1.25, 1))  
p3 <- c(rnorm(5, 1.25, 1))  
source_name_3 <- data.frame(p1, p2, p3)  


Chuck P
  • 3,862
  • 3
  • 9
  • 20