1

I am trying to rename a set of columns in nested dataframes based on the values of an unnested column. Here is a simplified example of the dataset:

library(tidyverse)

df_pre <- tribble(
  ~year, ~data,
  1970, tibble(GEOID_1970 = 1, TOTPOP_1970 = 2),
  1980, tibble(GEOID_1980 = 3, TOTPOP_1980 = 4)
)  

Using purrr, I would like to rename the nested columns so that I have the following:

df_post <- tribble(
  ~year, ~data,
  1970, tibble(GEOID = 1, TOTPOP = 2),
  1980, tibble(GEOID = 3, TOTPOP = 4)
)  

I've tried a variety of approaches, all of which throw some kind of error, e.g.:

library(purrr)
df_post <- df_pre %>% map2(.x = data, .y = year, 
                           ~ rename_with(str_replace,  
                                        pattern = paste0("_", .y), 
                                        replacement = ""))
#> Error: Can't convert a `tbl_df/tbl/data.frame` object to function

How can I use map2 plus rename_with to modify the nested column names? In addition to solving this particular problem, I am also trying to gain more insight about how to pass arguments such as year to map2 anonymous functions.

nicholas
  • 903
  • 2
  • 12

1 Answers1

1

We loop over the list column 'data' with map and use rename_with selecting all the columns (everything()) while removing the suffix part from the column name with str_remove)

library(dplyr)
library(purrr)
library(stringr)
df_new <- df_pre  %>%
    mutate(data = map(data, ~ .x %>%
          rename_with(~ str_remove(., "_\\d+$"), everything())))

-checking

identical(df_new, df_post)
#[1] TRUE

If we want to make use of 'year' column with map2

df_new <- df_pre %>%
      mutate(data = map2(data, year, ~  {
              yr <- .y
              .x %>% rename_with(~ str_remove(., str_c("_", yr)), everything())
              }))

-checking

identical(df_new, df_post)
#[1] TRUE
akrun
  • 874,273
  • 37
  • 540
  • 662
  • thanks so much. I'm trying to understand the general principle that requires assignment of .y within the anonymous function. Could you provide any additional insight? – nicholas Apr 18 '21 at 21:51
  • 1
    @nicholas in your code, the columns are pulled outside the `mutate/summarise` so you may need `df_pre %>% {map2(.x = .$data, .y = .$year, ~ .x)}` – akrun Apr 18 '21 at 21:53
  • This code (```df_pre %>% {map2(.x = .$data, .y = .$year, ~ .x)}```) creates a list of two tibbles. I don't understand how it addresses my question about assignment. Any insight would be much appreciated. – nicholas Apr 18 '21 at 22:00
  • 1
    @nicholas I just wanted to show where your code in the beginning is not correct because it won't work with map2(data, year) because these objects are the columns in the data.frame – akrun Apr 18 '21 at 22:03
  • @akrun your solution here worked for me. Would you mind explaining two elements in more detail: 1) why are curly braces `{}` needed within the call of `mutate()`, and 2) why is there a need for this assignment `yr <= .y`. – DSH Mar 21 '22 at 20:59
  • @DSH 1) `{}` is used when there are more than one expression. here `yr <- .y` is one expression, and `.x %>% ..` is second expression. 2) The reason is that we are again using a lamda expression in `str_remove` which may or may not get the `.y` from the environment. I haven't tested it but in some cases it wouldn't work. – akrun Mar 22 '22 at 14:51
  • Thanks @akrun, so when you have two expressions with the curly braces, as in this example, the two appear on separate lines, but they are not separated by commas. Is that because everything within the curly braces is now its own environment and is not part of the global environment? – DSH Mar 22 '22 at 17:13
  • @DSH It could become complicated when you are doing something inside `across` within the `map`. Thus, I assign it to a temporary object to avoid any issues. Each newline is by default a new expression or if you want to write in a single line, use `;` as the end of expression i.e. `x1 <- 10;y1<- 20` etc – akrun Mar 22 '22 at 17:14