I have an R data frame with data from multiple subjects, each tested several times. To perform statistics on the set, there is a factor for subject ("id") and a row for each observation (given by factor "session"). I.e.
print(allData)
id session measure
1 1 7.6
2 1 4.5
3 1 5.5
1 2 7.1
2 2 NA
3 2 4.9
In the above example, is there a simple way to remove all rows with id==2, given that the "measure" column contains NA in one of the rows where id==2?
More generally, since I actually have a lot of measures (columns) and four sessions (rows) for each subject, is there an elegant way to remove all rows with a given level of the "id" factor, given that (at least) one of the rows with this "id"-level contains NA in a column?
I have the intuition that there could be a build-in function that could solve this problem more elegantly than my current solution:
# Which columns to check for NA's in
probeColumns = c('measure1','measure4') # Etc...
# A vector which contains all levels of "id" that are present in rows with NA's in the probeColumns
idsWithNAs = allData[complete.cases(allData[probeColumns])==FALSE,"id"]
# All rows that isn't in idsWithNAs
cleanedData = allData[!allData$id %in% idsWithNAs,]
Thanks, /Jonas