4

I have some string data that has blanks instead of NA's and I want to change blanks to NAs:

test <- data.frame(year=c("1990","1991","","1993"),
                   value=c(50,25,20,5),
                   type=c('puppies', '', 'hello', 'die'))

test
year value    type
1 1990    50 puppies
2 1991    25        
3         20   hello
4 1993     5     die

edit: sorry the data table wont format right here, but you get the idea from the code.

This is how I would do it in another language (iterate over all rows and cols):

for (i in 1:nrow(test)){
  for (j in 1:ncol(test)){
    if (test[i,j] == ''){
      test[i,j] = NA
    }
  }
}

But R hates loops and punishes you by taking forever. But if I try a ifelse() statement ie

ifelse(test == '', NA, test)

It goes completely wonkers:

ifelse(test == '', NA, test)
[[1]]
[1] 1990 1991      1993
Levels:  1990 1991 1993

[[2]]
[1] 50 25 20  5

[[3]]
[1] NA

[[4]]
[1] 1990 1991      1993
Levels:  1990 1991 1993

[[5]]
[1] 50 25 20  5

[[6]]
[1] puppies         hello   die    
Levels:  die hello puppies

[[7]]
[1] 1990 1991      1993
Levels:  1990 1991 1993

[[8]]
[1] 50 25 20  5

[[9]]
[1] puppies         hello   die    
Levels:  die hello puppies

[[10]]
[1] NA

[[11]]
[1] 50 25 20  5

[[12]]
[1] puppies         hello   die    
Levels:  die hello puppies

What gives? Is there an easy way to apply it to the whole data frame like you would a vector?

For example:

ifelse(test$year == '', NA, test$year)

Appropriately gives:

[1] 2 3 NA 4

smci
  • 32,567
  • 20
  • 113
  • 146
Brian Jackson
  • 409
  • 1
  • 5
  • 16
  • You could use `test[test==''] <- NA;library(gdata); test <- drop.levels(test); str(test)` – akrun Aug 19 '14 at 06:50
  • @akrun, I think you misunderstood my answer. Also, why `gdata::drop.levels` and not just `droplevels` from base R? – A5C1D2H2I1M1N2O1R2T1 Aug 19 '14 at 08:46
  • @Ananda Mahto. Yes, I forgot about `droplevels`. It can be used. Also, I didn't have your "SOfun" installed. So, didn't check the results. – akrun Aug 19 '14 at 08:48
  • `ifelse(test == '', NA, test)` doesn't work because you want to compare and replace individual columns at a time, not an entire row of `test`. One way to do that is `apply/sapply` – smci Feb 01 '18 at 23:06

4 Answers4

3

There are several ways to do this without a package, but I've implemented this in a function called makemeNA in my GitHub-only "SOfun" package.

## Get the package
library(devtools)
install_github("SOfun", "mrdwab")

## Load the package and use the function
library(SOfun)
makemeNA(test, "")
#   year value    type
# 1 1990    50 puppies
# 2 1991    25    <NA>
# 3   NA    20   hello
# 4 1993     5     die

The function makes use of type.convert to change the column types as if you were reading in the data for the first time.

str(.Last.value)
# 'data.frame':  4 obs. of  3 variables:
#  $ year : int  1990 1991 NA 1993
#  $ value: int  50 25 20 5
#  $ type : Factor w/ 3 levels "die","hello",..: 3 NA 2 1

Essentially, the function boils down to the following:

lapply(test, function(x) type.convert(as.character(x), na.strings = ""))
# $year
# [1] 1990 1991   NA 1993
# 
# $value
# [1] 50 25 20  5
# 
# $type
# [1] puppies <NA>    hello   die    
# Levels: die hello puppies

Thus, you would get the same result if you did:

test[] <- lapply(test, function(x) 
    type.convert(as.character(x), na.strings = ""))

(But the makemeNA function has a few more tricks up its sleeves.)

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

Try following simple code from base R:

test[test==''] = NA
test
  year value    type
1 1990    50 puppies
2 1991    25    <NA>
3 <NA>    20   hello
4 1993     5     die

EDIT: check the str:

test<-data.frame(year=c("1990","1991","","1993"),value=c(50,25,20,5), type=c('puppies', '', 'hello', 'die'))
> 
> test
  year value    type
1 1990    50 puppies
2 1991    25        
3         20   hello
4 1993     5     die
> 
> str(test)
'data.frame':   4 obs. of  3 variables:
 $ year : Factor w/ 4 levels "","1990","1991",..: 2 3 1 4
 $ value: num  50 25 20 5
 $ type : Factor w/ 4 levels "","die","hello",..: 4 1 3 2
> 
> test[test==''] = NA
> 
> test
  year value    type
1 1990    50 puppies
2 1991    25    <NA>
3 <NA>    20   hello
4 1993     5     die
> 
> str(test)
'data.frame':   4 obs. of  3 variables:
 $ year : Factor w/ 4 levels "","1990","1991",..: 2 3 NA 4
 $ value: num  50 25 20 5
 $ type : Factor w/ 4 levels "","die","hello",..: 4 NA 3 2
> 
rnso
  • 23,686
  • 25
  • 112
  • 234
  • This is the typical answer, but view the `str` on your result to see the difference between your answer and mine. – A5C1D2H2I1M1N2O1R2T1 Aug 19 '14 at 06:06
  • Although I chose the other answer as 'more complete' since it does keep types in order, this answer answers my more general question of 'how do I apply a conditional test to a whole dataframe.' I would upvote it if I could figure out how to get enough rep to do so. – Brian Jackson Aug 19 '14 at 06:32
  • The str output seems to be the same; see the edit in my answer. – rnso Aug 19 '14 at 07:25
  • @mso. I think Ananda Mahto was referring to factor levels after you changed `''` to `NA`. Check the difference here `test[test==''] <- NA;library(gdata); test <- drop.levels(test); str(test)` – akrun Aug 19 '14 at 07:54
  • @BrianJackson, Another difference between the two approaches is that using `type.convert` gives you access to `na.strings`. Note that this is *plural*. Thus, if you wanted `""`, `25`, and `"puppies"` to be recoded to `NA`, you can supply a vector like `c("", 25, "puppies")` as the relevant function argument. – A5C1D2H2I1M1N2O1R2T1 Aug 19 '14 at 08:39
  • @akrun, not just the factor levels, but also if you had columns that are otherwise coercible to a different format, `makemeNA` would do that. Example: `df <- data.frame(A = c(TRUE, FALSE, "cat"), B = c(1, 2, "cat"), C = c("A", "B", "cat")); str(makemeNA(df, "cat"))`. – A5C1D2H2I1M1N2O1R2T1 Aug 19 '14 at 08:42
0

Could it be a solution if you just convert your data frame to matrix?

 test_1 <- as.matrix(test)

Then you could run your ifelse statement as you do it with a single column of data frame.

Gvaihir
  • 51
  • 5
0

ifelse(test == '', NA, test) doesn't work because you want to compare and replace individual columns at a time, not an entire row of test.

The right way to apply a function to each individual cell is apply/sapply :

> sapply(test, function(x) { ifelse(x=='', NA, x) })
     year   value type     
[1,] "1990" "50"  "puppies"
[2,] "1991" "25"  NA       
[3,] NA     "20"  "hello"  
[4,] "1993" "5"   "die" 
smci
  • 32,567
  • 20
  • 113
  • 146