I have some string data that has blanks instead of NA's and I want to change blanks to NAs:
test <- data.frame(year=c("1990","1991","","1993"),
value=c(50,25,20,5),
type=c('puppies', '', 'hello', 'die'))
test
year value type
1 1990 50 puppies
2 1991 25
3 20 hello
4 1993 5 die
edit: sorry the data table wont format right here, but you get the idea from the code.
This is how I would do it in another language (iterate over all rows and cols):
for (i in 1:nrow(test)){
for (j in 1:ncol(test)){
if (test[i,j] == ''){
test[i,j] = NA
}
}
}
But R hates loops and punishes you by taking forever. But if I try a ifelse() statement ie
ifelse(test == '', NA, test)
It goes completely wonkers:
ifelse(test == '', NA, test)
[[1]]
[1] 1990 1991 1993
Levels: 1990 1991 1993
[[2]]
[1] 50 25 20 5
[[3]]
[1] NA
[[4]]
[1] 1990 1991 1993
Levels: 1990 1991 1993
[[5]]
[1] 50 25 20 5
[[6]]
[1] puppies hello die
Levels: die hello puppies
[[7]]
[1] 1990 1991 1993
Levels: 1990 1991 1993
[[8]]
[1] 50 25 20 5
[[9]]
[1] puppies hello die
Levels: die hello puppies
[[10]]
[1] NA
[[11]]
[1] 50 25 20 5
[[12]]
[1] puppies hello die
Levels: die hello puppies
What gives? Is there an easy way to apply it to the whole data frame like you would a vector?
For example:
ifelse(test$year == '', NA, test$year)
Appropriately gives:
[1] 2 3 NA 4