Check for a match between a list of values and a column entry in R

Question

I have a dataset with a column Disease that contains string values. I also have a list of names with rare diseases rare_disease.

I want to check, for each cell of the column Disease, whether it contains an element from the list rare_disease and if so, to create a new column in my dataframe and give the value 1 to that entry.

I tried using the ifelse function, like so:

FinalData$RareDisease <- ifelse(rare_disease %in% FinalData$Disease,1,0)

But I guess that checks whether the the corresponding rows in both variables are the same, so it throws an error. Instead, I want every cell of Disease to be checked against every single element of rare_disease, if that makes sense.

I have also tried match and is.element() as suggested here Test if a vector contains a given element but they don't work either.

Shouldn't it be ifelse(FinalData$Disease %in% rare_disease,1,0) — Esben Eickhardt, Apr 30 '19 at 13:25
I think you need to flip it: `FinalData$RareDisease <- ifelse(FinalData$Disease %in% rare_disease, 1, 0)` — zx8754, Apr 30 '19 at 13:25
From your previous post, I think you want regex match, not exact match: https://stackoverflow.com/q/55907952/680068 , please provide example input, and expected output. — zx8754, Apr 30 '19 at 13:27

score 2 · Answer 1 · answered Apr 30 '19 at 13:26

2

You are almost right, but you should flip it:

FinalData$RareDisease <- ifelse(rare_disease %in% FinalData$Disease,1,0)

answered Apr 30 '19 at 13:26

Esben Eickhardt

3,183
2
35
56

score 2 · Accepted Answer · answered Apr 30 '19 at 13:28

2

Here's a reproducible example/solution -- noting you can just use as.numeric instead of ifelse:

df <- data.frame(
  idx = 1:10,
  Disease = letters[1:10]
)
rare_disease <- letters[c(1, 5, 9)]

df
#>    idx Disease
#> 1    1       a
#> 2    2       b
#> 3    3       c
#> 4    4       d
#> 5    5       e
#> 6    6       f
#> 7    7       g
#> 8    8       h
#> 9    9       i
#> 10  10       j
rare_disease
#> [1] "a" "e" "i"

df$RareDisease <- as.numeric(df$Disease %in% rare_disease)
df
#>    idx Disease RareDisease
#> 1    1       a           1
#> 2    2       b           0
#> 3    3       c           0
#> 4    4       d           0
#> 5    5       e           1
#> 6    6       f           0
#> 7    7       g           0
#> 8    8       h           0
#> 9    9       i           1
#> 10  10       j           0

^{Created on 2019-04-30 by the reprex package (v0.2.1)}

answered Apr 30 '19 at 13:28

JasonAizkalns

20,243
8
57
116

Thanks, that still gives all 0s in my data so I guess there's probably a problem in the formatting, I'll have a look. That should still work if `rare_disease` is a dataframe, right? – Floran Thovan Apr 30 '19 at 13:45
@FloranThovan The above should work if `rare_disease` is a vector. If `rare_disease` is a `data.frame`, you would need to do: `(df$Disease %in% df_rare_disease$rare_disease)`. This is why creating reproducible example is preferred. – JasonAizkalns Apr 30 '19 at 13:49
Ah, got it. Thanks again for the help. – Floran Thovan Apr 30 '19 at 14:03

Check for a match between a list of values and a column entry in R

2 Answers2