0

This is a simplified version of a problem involving a large list containing complex tables. I want to extract the tables from the list and apply a function to each one. Here we can create a simple list containing small named data frames:

library(tidyverse)

table_names <- c('dfA', 'dfB', 'dfC')

dfA <- tibble(a = 1:3, b = 4:6, c = 7:9)
dfB <- tibble(a = 10:12, b = 13:15, c = 16:18)
dfC <- tibble(a = 19:21, b = 22:24, c = 25:27)

df_list <- list(dfA, dfB, dfC) %>% setNames(table_names)

Here is a simplified example of the kind of operation I would like to apply:

dfA_mod <- df_list$dfA %>% 
  mutate(name = 'dfA') %>%
  select(name, everything()) 

In this example, I extract one of three tables in the list df_list$dfA, create a new column with the same value in each row mutate(name = 'dfA'), and re-order the columns so that the new column appears in the left-most position select(name, everything()). The resulting object is assigned to dfA_mod.

To solve the larger problem, I want to use one of the purrr::map() variants to apply the function over the character vector table_names, which was initiated in the first block of code above. The elements of table_names serve two purposes: 1) naming the tables held in the list; and 2) supplying values for the name column in the modified table.

I could write a function such as:

fun <- function(x) {
df_list$x %>% 
  mutate(name = x) %>%
  select(name, everything()) %>%
  assign(paste0(x, '_mod'), ., envir = .GlobalEnv)
}

And then use map() to create a new list of modified tables:

new_list <- df_list %>% map(table_name, fun(x))

But of course this code does not work, with the main obstacle being (for me at least) figuring out how to quote and unquote the right terms within the function. I'm a beginner at tidy evaluation, and I could use some help in specifying the function and using map properly.

Here is the desired output (for one modified table):

# A tibble: 3 x 4
  name      a     b     c
  <chr> <int> <int> <int>
1 dfA       1     4     7
2 dfA       2     5     8
3 dfA       3     6     9

Thanks in advance for any help!

DSH
  • 427
  • 2
  • 10
  • In R tables is a term that applies to a matrix-like object; typically a contingency table. Dataframes on the other hand are a specific type of list. The two types have differences in there extraction function semantics. You might presume that “tibbles” would be quite similar to tables but that’s not actually the case. – IRTFM Jan 04 '20 at 06:14
  • Thank you - I recognize now that my use of 'table' and 'data frame' was imprecise. – DSH Jan 04 '20 at 21:01

2 Answers2

1

We can use purrr::imap which passes data in the list as well as name of the list

library(dplyr)
library(purrr)

df_out <- imap(df_list, ~.x %>% mutate(name = .y) %>% select(name, everything()))
df_out

#$dfA
# A tibble: 3 x 4
#  name      a     b     c
#  <chr> <int> <int> <int>
#1 dfA       1     4     7
#2 dfA       2     5     8
#3 dfA       3     6     9

#$dfB
# A tibble: 3 x 4
#  name      a     b     c
#  <chr> <int> <int> <int>
#1 dfB      10    13    16
#....
#....

This gives a list of desired dataframes, if you want them as separate dataframes, you can do

names(df_out) <- paste0(names(df_out), "_mod")
list2env(df_out, .GlobalEnv)

We can also do it using base R Map

df_out <- Map(function(x, y) transform(x, name = y)[c('name', names(x))], 
                               df_list, names(df_list))

and give list names same as above.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Elegant solution, and it works very well in the more complex application.Thank you! – DSH Jan 04 '20 at 21:00
0

We can convert it to a single data.frame with map while passing the .id

library(purrr)
map_dfr(df_list,  I, .id = 'name')

Or with bind_rows

library(dplyr)
bind_rows(df_list, .id = 'name')
# A tibble: 9 x 4
#  name      a     b     c
#  <chr> <int> <int> <int>
#1 dfA       1     4     7
#2 dfA       2     5     8
#3 dfA       3     6     9
#4 dfB      10    13    16
#5 dfB      11    14    17
#6 dfB      12    15    18
#7 dfC      19    22    25
#8 dfC      20    23    26
#9 dfC      21    24    27
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you - it's helpful to know about these other methods as well. – DSH Jan 05 '20 at 21:54
  • What is the purpose of the `I` argument in the statement `map_dfr(df_list, I, .id = 'name')`? – DSH Jan 05 '20 at 23:00