New variable conditional on whether a df1 column value equals any value included in a specific df2 column

Question

I am trying to make a new variable using mutate() . In df1, I have ranges of values in col1, col2, col3, and col4. I would like to create a new binary variable in df1 that is "1" IF any of the col1-4 values are found in a specific df2 column (let's say col10).

Thanks!

This is what I have tried so far, but I don't think it is returning a value of "1" for all matching value, only some of them.

df1 %>%
  mutate(newvar = case_when(
    col1 == df2$col10 | col2 == df2$col10 | col3 == df2$col10 | col4 == df2$col10 ~ 1
  ))

score 2 · Answer 1 · answered Feb 09 '23 at 17:10

2

We could use if_any here. If the number of rows are the same, use == for elementwise comparison instead of %in%

library(dplyr)
df1 %>% 
   mutate(newvar = +(if_any(col1:col4,  ~.x %in% df2$col10)))

answered Feb 09 '23 at 17:10

akrun

874,273
37
540
662

score 1 · Answer 2 · answered Feb 09 '23 at 16:43

First, let's make some dummy data. df1 has 4 columns and df2 has one column named col10. In the dummy data, rows 1,2,3 and 5 have matches in df2$col10.

library(dplyr)
df1 <- data.frame(col1 = 1:5, col2=3:7, col3=5:9, col4=10:14)

df2 <- data.frame(col10 = c(1,2,3,14))

We can use rowwise() to do computations within each row and then c_across() to identify that variables of interest. The code identifies whether any of the values in the four columns are in df2$col10 and returns a logical value. The as.numeric() turns that logical value into 0 (FALSE) and 1 (TRUE).

df1 %>% 
  rowwise() %>% 
  mutate(newvar = as.numeric(any(c_across(col1:col4) %in% df2$col10)))
#> # A tibble: 5 × 5
#> # Rowwise: 
#>    col1  col2  col3  col4 newvar
#>   <int> <int> <int> <int>  <dbl>
#> 1     1     3     5    10      1
#> 2     2     4     6    11      1
#> 3     3     5     7    12      1
#> 4     4     6     8    13      0
#> 5     5     7     9    14      1

^{Created on 2023-02-09 by the reprex package (v2.0.1)}

New variable conditional on whether a df1 column value equals any value included in a specific df2 column

2 Answers2