0

I'm looking for a more eloquent way to write R code for a kind of case that I've encountered more than once. Here is an example of the data and some code that accomplishes the result I want:

library(tidyverse)

df <- tibble(id = 1:5, primary_county = 101:105, secondary_county = 201:205)

specific_counties <- c(101, 103, 202, 205)

df |> 
  mutate(target_area = 
           primary_county %in% specific_counties | secondary_county %in% specific_counties)

The result is:

    # A tibble: 5 × 4
         id primary_county secondary_county target_area
      <int>          <int>            <int> <lgl>      
    1     1            101              201 TRUE       
    2     2            102              202 TRUE       
    3     3            103              203 TRUE       
    4     4            104              204 FALSE      
    5     5            105              205 TRUE  
     

I want to know if there is a way to get the same result using code that would be more succinct and eloquent if I were dealing with more columns of the "..._county" variety. Specifically, in my code above, the expression %in% specific_counties must be repeated with an | for each extra column I want to handle. Is there a way to not have to repeat this kind of phrase multiple times?

GuedesBF
  • 8,409
  • 5
  • 19
  • 37
Sam
  • 17
  • 4

2 Answers2

2

These logical rowwise operations are superbly well handled by dplyr::if_any() or dplyr::if_all():

library(dplyr)

df %>%
    mutate(target_area = if_any(ends_with('county'), ~. %in% specific_counties))

# A tibble: 5 × 4
     id primary_county secondary_county target_area
  <int>          <int>            <int> <lgl>      
1     1            101              201 TRUE       
2     2            102              202 TRUE       
3     3            103              203 TRUE       
4     4            104              204 FALSE      
5     5            105              205 TRUE   

We can also use:

  • purrr::reduce with |,
  • rowSums with as.logical
  • purrr::pmap_lgl with any(c(...) %in% x)
library(purrr)
library(dplyr)

df %>%
    mutate(target_area = reduce(across(ends_with('county'), ~.x %in% specific_counties),
                                `|`))
## OR ##

df %>%
    mutate(target_area = rowSums(across(ends_with('county'), ~.x %in% specific_counties)) %>%
               as.logical)

## OR ##

df %>%
    mutate(target_area = pmap_lgl(across(ends_with('county')),
                                  ~any(c(...) %in% specific_counties)))

For reference, this other answer of mine shows similar usages for if_any, and reduce(|) in a filter() operation: R - Remove rows from dataframe that contain only zeros in numeric columns, base R and pipe-friendly methods?

Additional related questions/answers:

Logical function across multiple columns using "any" function

How to create a new column based on if any of a subset of columns are NA with the dplyr

GuedesBF
  • 8,409
  • 5
  • 19
  • 37
  • 1
    Thank you, your first answer is exactly the kind of thing I hoped there would be a dplyr solution for. – Sam Sep 29 '22 at 20:21
1

This allows a little over what you have, not sure how "eloquent" I'd call it:

df %>%
  mutate(
    target_area = rowSums(
      sapply(select(cur_data(), matches("_county")),
             `%in%`, specific_counties)) > 0
  )
# # A tibble: 5 x 4
#      id primary_county secondary_county target_area
#   <int>          <int>            <int> <lgl>      
# 1     1            101              201 TRUE       
# 2     2            102              202 TRUE       
# 3     3            103              203 TRUE       
# 4     4            104              204 FALSE      
# 5     5            105              205 TRUE       

Or you can list the columns explicitly, replacing the select(.., matches(..)) with list(primary_county, secondary_county).

Add as many columns to the list(..) as you want.

r2evans
  • 141,215
  • 6
  • 77
  • 149