3

I'm trying to conditionally replace values in multiple columns based on a string match in a different column but I'd like to be able to do so in a single line of code using the across() function but I keep getting errors that don't quite make sense to me. I feel like this is probably a simple solution so if anyone could point me in the right direction, that would be fantastic!

df <- data.frame("type" = c("Park", "Neighborhood", "Airport", "Park", "Neighborhood", "Neighborhood"),
               "total" = c(34, 56, 75, 89, 21, 56),
               "group_a" = c(30, 26, 45, 60, 3, 46),
               "group_b" = c(4, 30, 30, 29, 18, 10))

# working but not concise
df %>%
  mutate(total = ifelse(str_detect(type, "Park"), NA, total),
         group_a = ifelse(str_detect(type, "Park"), NA, group_a),
         group_b = ifelse(str_detect(type, "Park"), NA, group_b))

  
# concise but not working
df %>% mutate(across(total, group_a, group_b), ifelse(str_detect(type, "Park"), NA, .))

Update

We got a solution that works with my dummy dataset but is not working with my real data, so I am going to share a small snippet of my real data frame with the numbers changed and organization names hidden. When I run this line of code (df %>% mutate(across(c(Attempts, Canvasses, Completes)), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .))) on these data, I get the following error message:

Error: Problem with mutate() input ..2. x Input ..2 must be a vector, not a formula object. i Input ..2 is ~ifelse(str_detect(long_name, "park-cemetery"), NA, .).

This a small sample of the data that produces this error:

df <- structure(list(Org = c("OrgName", "OrgName", "OrgName", "OrgName", 
"OrgName", "OrgName", "OrgName", "OrgName", "OrgName", "OrgName"
), nCode = c("M34", "R36", "R46", "X29", "M31", "K39", "Q12", 
"Q39", "X41", "K27"), Attempts = c(100, 100, 100, 100, 100, 100, 
100, 100, 100, 100), Canvasses = c(80, 80, 80, 80, 80, 80, 80, 
80, 80, 80), Completes = c(50, 50, 50, 50, 50, 50, 50, 50, 50, 
50), van_nocc_id = c(999, 999, 999, 999, 999, 999, 999, 999, 
999, 999), van_name = c("M-Upper West Side", "SI-Rosebank", "SI-Tottenville", 
"BX-park-cemetery-etc-Bronx", "M-Stuyvesant Town-Cooper Village", 
"BK-Kensington", "Q-Broad Channel", "Q-Lindenwood", "BX-Wakefield", 
"BK-East New York"), boro_short = c("M", "SI", "SI", "BX", "M", 
"BK", "Q", "Q", "BX", "BK"), long_name = c("Upper West Side", 
"Rosebank", "Tottenville", "park-cemetery-etc-Bronx", "Stuyvesant Town-Cooper Village", 
"Kensington", "Broad Channel", "Lindenwood", "Wakefield", "East New York"
)), row.names = c(NA, -10L), class = "data.frame")

Final update

The curse of the misplaced closing bracket! Thanks to everyone for your help... the correct solution was df %>% mutate(across(c(Attempts, Canvasses, Completes), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .)))

Natalie O'Shea
  • 197
  • 1
  • 8

3 Answers3

3

If you use the newly introduced function across (which is the correct way to approach this task), you have to specify inside across itself the function you want to apply. In this case the function ifelse(...) has to be a purrr-style lambda (so starting with ~). Check out across documentation and look for the arguments .cols and .fns.

df %>% 
  mutate(across(c(total, group_a, group_b), ~ifelse(str_detect(type, "Park"), NA, .)))

Output

#           type total group_a group_b
# 1         Park    NA      NA      NA
# 2 Neighborhood    56      26      30
# 3      Airport    75      45      30
# 4         Park    NA      NA      NA
# 5 Neighborhood    21       3      18
# 6 Neighborhood    56      46      10
Ric S
  • 9,073
  • 3
  • 25
  • 51
  • Thanks for your help! Interestingly this works for my dummy dataset but not when I apply the exact same syntax to my much larger real dataset... I keep getting this error message: Error: Problem with `mutate()` input `..2`. x Input `..2` must be a vector, not a `formula` object. i Input `..2` is `~ifelse(str_detect(long_name, "park-cemetery"), NA, .)`` Any thoughts on why it might be throwing this error with a larger dataset? – Natalie O'Shea Jun 25 '20 at 15:27
  • Are you still passing the variables you want to mutate with a `c(variable_1, variable_2, ...)`? – Ric S Jun 25 '20 at 15:33
  • Yup, exactly like that. – Natalie O'Shea Jun 25 '20 at 15:34
  • Can you share a sample of your bigger data using the `dput` function and pasting the output in you question? Also, could you paste the code you used to get that error? – Ric S Jun 25 '20 at 15:43
  • Posted an update with a sample of the real (but modified) data. Thanks again for your insights! – Natalie O'Shea Jun 25 '20 at 16:39
  • The problem is that you need to specify the `~ifelse` lambda function *inside* `across` as its second argument (the `.fns` argument, see documentation), not inside `mutate`. You have to shift one bracket to do so: `df %>% mutate(across(c(Attempts, Canvasses, Completes), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .)))`. My original answer is in fact correct – Ric S Jun 26 '20 at 09:48
  • 1
    Ahh, the curse of the misplaced closing bracket! Thanks for your help! – Natalie O'Shea Jun 26 '20 at 17:24
2

Here a data.table solution.

require(data.table)
df <- data.frame("type" = c("Park", "Neighborhood", "Airport", "Park", "Neighborhood", "Neighborhood"),
               "total" = c(34, 56, 75, 89, 21, 56),
               "group_a" = c(30, 26, 45, 60, 3, 46),
               "group_b" = c(4, 30, 30, 29, 18, 10))

setDT(df)
df[type == "Park", c("total", "group_a", "group_b") := NA]
ljwharbers
  • 393
  • 2
  • 8
0

Update: that didn't take long to figure out! Just needed to place the columns in a vector:

# concise AND working!
df %>% mutate(across(c(total, group_a, group_b)), ifelse(str_detect(type, "Park"), NA, .))

I had tried this initially but placed the columns in quotes... don't do that :)

Natalie O'Shea
  • 197
  • 1
  • 8
  • Actually this answer doesn't work because you get a column named `felse(str_detect(type, "Park"), NA, .)` (at least in my case). Check out my answer above instead. – Ric S Jun 25 '20 at 15:19