Recoding with missingness in R

Question

I am trying to recode and create a variable based on the four variables. Recoding should be like this. If any of the four columns have the value of 1, then the recoded value should be 1, even though any 0 exists. If that row of values has only 0s and NAs, then the recoded value should be 0. If all of them are NA, then the recoded value should be NA.

My data look like this.

a = c(1, 1, 1, 1, NA, 0, NA)
b = c(0, 1, NA, 1, 0, NA, NA)
c = c(1, NA, 1, 0, NA, 0, NA)
d = c(1, 0, NA, 1, NA, 0, NA)
df <- data.frame(a,b,c,d)

Using ifelse function, I get this below.

> df$recoded <- ifelse(df$a== 1 | df$b == 1 | df$c == 1| df$d == 1, 1, 0)
> df
   a  b  c  d recoded
1  1  0  1  1       1
2  1  1 NA  0       1
3  1 NA  1 NA       1
4  1  1  0  1       1
5 NA  0 NA NA      NA
6  0 NA  0  0      NA
7 NA NA NA NA      NA

The problem is when there are 0s and NAs in the row (i.e., 5th and 6th rows), the recoded value should be 0 rather than NA.

I would like to get the data frame as below.

> df
   a  b  c  d recoded
1  1  0  1  1       1
2  1  1 NA  0       1
3  1 NA  1 NA       1
4  1  1  0  1       1
5 NA  0 NA NA       0
6  0 NA  0  0       0
7 NA NA NA NA      NA

Any thoughts on this?

Thanks in advance.

markus · Accepted Answer · 2018-07-30T20:34:11.507

You can use apply

df$recoded <- apply(df, 1, function(x) ifelse(all(is.na(x)), NA, max(x, na.rm = TRUE)))
df
#   a  b  c  d recoded
#1  1  0  1  1       1
#2  1  1 NA  0       1
#3  1 NA  1 NA       1
#4  1  1  0  1       1
#5 NA  0 NA NA       0
#6  0 NA  0  0       0
#7 NA NA NA NA      NA

If all elements in one row are NA then df$recoded will be NA, else it will be the maximum of the row (with NA's removed).

Recoding with missingness in R

1 Answers1