0

I have a data frame which contains several numeric variables. I have written a sorting algorithm that sorts the rows by comparing the values in the columns containing the numeric values I'm interested in.

The values are YYYYMMDD in numeric format. However, some entries have 0 (zeros) as a value where it really should be an NA. This means that a comparison is possible between for instance 20001224 and 0 even though it does not make sense as the 0 is a not-applicable value.

I could turn the values into dates using strptime, thus getting rid of the non-dates. However, in an attempt to understand how I can recode several columns of a data frame into NA values, I wanted to post it as a question here.

There must be an easy way (using one of the apply functions) to go column by column and recode all the 0's (zeros) into NAs.

EnrollmentBegin EnrollmentBegin2 EnrollmentBegin3 EnrollmentEnd EnrollmentEnd2 EnrollmentEnd3
20040129        20130107         0               20060526       20140816       0
20050829        0                0               20070822       0              0
20000831        0                0               20020524       0              0
20080827        0                0               20090526       0              0

Here is the dput of an excerpt of my data:

structure(list(EnrollmentBegin = c(20040129, 20050829, 20000831, 20080827), EnrollmentBegin2 = c(20130107, 0, 0, 0), EnrollmentBegin3 = c(0, 0, 0, 0),           EnrollmentEnd = c(20060526, 20070822, 20020524, 20090526 ), EnrollmentEnd2 =     c(20140816, 0, 0, 0), EnrollmentEnd3 = c(0, 0, 0, 0)), .Names = c("EnrollmentBegin", "EnrollmentBegin2", "EnrollmentBegin3", "EnrollmentEnd", "EnrollmentEnd2", "EnrollmentEnd3"), row.names = c("3", "5", "6", "7"), class = "data.frame")
WykoW
  • 181
  • 5
  • How about `x[x==0]<-NA` – MrFlick Nov 30 '15 at 02:50
  • That was how I thought I could do it, but it made R crash and according to the R Documentation on ‘Not Available’ / Missing Values, they don't recommend that method: "The NA of character type is distinct from the string "NA". Programmers who need to specify an explicit missing string should use NA_character_ (rather than "NA") or set elements to NA using is.na<-." – WykoW Nov 30 '15 at 02:54
  • What do you mean it made R crash? Does it do so for the example you provided? It would be best to properly import your data to convert those to NA's right away, but that's not the question you asked. – MrFlick Nov 30 '15 at 02:56
  • No, for the example it works, but my dataset is much larger and has a mix of different vectors (numeric, factor, character) and I was trying to only do the operation on a subset of the columns (which I per your solution should not bother to do). I was just curious about the R documentation. Sometimes I forget to attempt the simplest solution, which often turns out to be the most efficient and elegant method. I'll try yours. Thanks for the answer! – WykoW Nov 30 '15 at 03:08

0 Answers0