0

I want to create a simplified way of recoding the same variable (the same way) across multiple data frames. For example, right now I'm re-coding an age variable from state datasets FL and GA. I'm currently coding them separately. How can I condense this code?

FL <- FL %>% 
  mutate(
    # Create categories
    age_group = dplyr::case_when(
                              age >= 18 & age <= 29 ~ "18-29",
                              age >= 30 & age <= 39 ~ "30-39",
                              age >= 40 & age <= 49 ~ "40-49",
                              age >= 50 & age <= 64 ~ "50-64", 
                              age >= 65 ~ "65+"),
    # Convert to factor
    age_group = factor(
      age_group,
      level = c("18-29", "30-39","40-49", "50-64","65+")
    )
  )

GA <- GA %>% 
  mutate(
    # Create categories
    age_group = dplyr::case_when(
                              age >= 18 & age <= 29 ~ "18-29",
                              age >= 30 & age <= 39 ~ "30-39",
                              age >= 40 & age <= 49 ~ "40-49",
                              age >= 50 & age <= 64 ~ "50-64", 
                              age >= 65 ~ "65+"),
    # Convert to factor
    age_group = factor(
      age_group,
      level = c("18-29", "30-39","40-49", "50-64","65+")
    )
  )
joran
  • 169,992
  • 32
  • 429
  • 468
  • You can use `cut(age, breaks = c(-Inf, 18, 29, 39, 49, 64, Inf))`. Better would be to create a function i.e. `age_grp_fn <- function(age) cut(age, breaks = c(-Inf, 18, 29, 39, 49, 64, Inf))` and reuse the function on each dataset – akrun Feb 27 '23 at 17:05
  • To avoid rewriting the code, you can put it into a function, then call it or apply it to all your data.frames – divibisan Feb 27 '23 at 17:08
  • Do you have sample code? I'm struggling – Still learning Feb 27 '23 at 17:25

1 Answers1

0

We can call the same function as the argument to a loop-function

First, put all your data.frames in a list (several methods for that, hard to tell which is best without a proper reproducible example). An example:

my_dfs <- list(FL, GA)

Then define your function:

my_function <- function(x) x %>% 
  mutate(
    age_group = dplyr::case_when(
                              age >= 18 & age <= 29 ~ "18-29",
                              age >= 30 & age <= 39 ~ "30-39",
                              age >= 40 & age <= 49 ~ "40-49",
                              age >= 50 & age <= 64 ~ "50-64", 
                              age >= 65 ~ "65+"),
    age_group = factor(
      age_group,
      level = c("18-29", "30-39","40-49", "50-64","65+")
    )
  )

And finally call it in a loop:

lapply(my_dfs, my_function)

GuedesBF
  • 8,409
  • 5
  • 19
  • 37