2

I am using a dataset where the missing values for variables are specified with specific numbers. I am trying to create one dataframe where I replace these values with blanks and another dataframe where I replace them with NA's. For this question, I will focus on the dataframe where they are replaced with NA's.

For the variables, missing values are specified by the numbers 8 or 9. I feel like I could use mutate_at() to change all of them or possibly an apply() function, but I am open to any suggestions. The general logic I am trying to write is: for each specified column, if the value is 8 or 9, replace with blank, else keep the value the same.

The dataset is structured so that each column represents one variable. I am trying to select a subset of the variables from the dataframe since only a few columns have missing values. I have looked at this example, but it doesn't completely answer my question.

I know I could do something like this, but it would require me specifying the values of all the other values non-missing values in the dataframe. I would prefer a solution where I can specify what happens to 8's and 9's (the missing values) and can keep the others the same without listing them out.

mutate_at(vars(card, lung, diabetes), function(x) case_when (x == 8 ~ "NA", x == 9 ~ "NA", x == 6 ~ 6, x == 4 ~ 4, x == 3 ~ 3, x == 2 ~ 2, x == 1 ~ 1))
melbez
  • 960
  • 1
  • 13
  • 36

3 Answers3

3

Here, we need

library(dplyr)
df1 %>%
      mutate_at(vars(card, lung, diabetes), ~ replace(., . %in% 8:9, NA))
#   card lung diabetes val
#1   NA    1        1   1
#2   NA    3        4   2
#3    1   NA        3   3
#4    2   NA        5   4
#5    3   NA       NA   5

Or if we use case_when by default the TRUE is NA, so the condition can be

df1 %>%
      mutate_at(vars(card, lung, diabetes),  ~ case_when(! . %in% 8:9 ~ .))
#   card lung diabetes val
#1   NA    1        1   1
#2   NA    3        4   2
#3    1   NA        3   3
#4    2   NA        5   4
#5    3   NA       NA   5

Or another option is na_if

df1 %>%
    mutate_at(vars(card, lung, diabetes), ~ na_if(., 8) %>% na_if(.,9)) 
#  card lung diabetes val
#1   NA    1        1   1
#2   NA    3        4   2
#3    1   NA        3   3
#4    2   NA        5   4
#5    3   NA       NA   5

data

df1 <- data.frame(card = c(8, 9, 1, 2, 3), lung = c(1, 3, 8, 9, 8),
     diabetes = c(1, 4, 3, 5, 8), val = 1:5)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Could you please explain what the parts after replace, case_when, and na_if mean? After replace, is the x referring to the variables? What does the exclamation point mean in case_when? And what is the . in na_if? – melbez May 03 '20 at 18:20
  • 1
    @melbez sorry, that was a typo. It should be `.` to refer to the column values – akrun May 03 '20 at 18:24
  • I get the following error message when trying to use the first solution. Do you know why? Error in replace(. %in% 8:9, NA) : argument "values" is missing, with no default – melbez May 03 '20 at 18:26
  • 1
    @melbez forgot the `.` before the condition. Now, it should work. tested with a small example – akrun May 03 '20 at 18:31
0

In base R:

cols = c('card', 'lung', 'diabetes')
temp = df[, cols]
temp[temp == 8 | temp == 9] = NA
df[, cols] = temp
Thomas Rosa
  • 630
  • 10
  • 21
0

in one simple line

apply(your.data.frame, 1, function(x){ifelse(x==8| x==9, NA,x)})

thus:

your.data.frame <- matrix(c(12,3,4,5,6,78,8,11,8,9, 2,45,65.6,6,7,8,9,12 ), ncol=3)
new.data.frame <- t(apply(your.data.frame, 1, function(x){ifelse(x==8| x==9, NA,x)}))
new.data.frame     
[,1] [,2] [,3]
[1,] 12.0    3    4
[2,]  5.0    6   78
[3,]   NA   11   NA
[4,]   NA    2   45
[5,] 65.6    6    7
[6,]   NA   NA   12
efz
  • 425
  • 4
  • 9