0

This question is for the Haven package in R.

Suppose that I have a variable with the following labels:
1 (Agree)
2 (Neutral)
3 (Disagree)
5 (Did not respond)

But the SPSS .sav file I imported into R using haven didn't designate 5 as the missing variable NA.

Because of this, I need to manually designate the variables that have the label option of Did not respond and numeric value of 5 as the missing values. But, I can't just get rid of every 5 in the dataset because other variables have 5 where it's meaningful. I also can't manually list the variables I need to assign 5 as the missing variable because there are thousands of variables in the dataset.

Is there a way to just designate the label Did not respond as the missing value for every variable in the dataset?

ssjjaca
  • 219
  • 1
  • 6
  • `type.convert(data, na.strings = 'Did not respond')` should work – rawr Aug 15 '21 at 05:12
  • Isn't this for when the variable is a character? The variables in my dataset come up as integers with the labels attached to the numeric values. – ssjjaca Aug 15 '21 at 05:17
  • try it `type.convert(mtcars, na.strings = '6')` – rawr Aug 15 '21 at 05:21
  • Yes this gets rid of all values of 6 but I need to avoid this because there are other variables in my dataset where 6 is not NA. I can't practically manually select the variables due to there being more than 1000 variables, hence my desire to selectively only indicate values with the label of "Did not respond" to be NA regardless of the actual numeric integer. – ssjjaca Aug 15 '21 at 05:26
  • that's why i first suggested to use 'Did not respond" and not '5', did you try it yet? – rawr Aug 15 '21 at 05:28
  • Yes, I tried it but it doesn't get rid of the values with the labels since the values are being read as integers and not characters. – ssjjaca Aug 15 '21 at 05:29
  • can you edit your post with `dput(data[1:5, 1:5])` plus what you are doing or whatever will demonstrate this – rawr Aug 15 '21 at 05:34
  • Check the labelled_spss function in haven. It allows you to specify a user-defined na_value. – deschen Aug 15 '21 at 07:25

1 Answers1

0

I was struggling with exactly this: a large SPSS .sav file read in using haven, containing numeric variables with different values labelled as "dont know", and these values could be valid for other variables. A colleague wrote this loop function for me which has worked (I've replaced my "dont know" with your "Did not respond").

It relies on the labelled package.

replace_no_resp <- function(data) {
for (col in names(data)) { 
    labels_ls <- val_labels(data[[col]]) 

if ("Did not respond" %in% labels(val_labels(data[[col]]))) { 
  no_resp_value <- labels_ls[names(labels_ls) == "Did not respond"] 
  no_resp_index <- which(data[[col]] == no_resp_value) 
  data[[col]][no_resp_index] <- NA 
    }
  }
  return(data)
}

dat_nas <- replace_no_resp(dat)
megsk
  • 1