1

I have data frame where the missing values are denoted with star sign "*".

I have replaced them with > mydata[mydata == "*"] <- NA but when I use str(mydata) it shows that the missing values are still "*". Like

'data.frame':   117 obs. of  8 variables:
 $ PRICE: Factor w/ 82 levels "*","1000","1020",..: 36 37 39 39 35 34 32 29 27 26 ...

As if I have not applied > mydata[mydata == "*"] <- NA

ilhan
  • 8,700
  • 35
  • 117
  • 201

2 Answers2

1

I should have used na.strings = "*" while reading the data file.

ilhan
  • 8,700
  • 35
  • 117
  • 201
  • 4
    FYI: Your code `mydata[mydata == "*"] <- NA` _did_ successfully replace all `*`'s with NAs. It's just that the `*`'s caused those columns to be read in as factors, and replacing the values doesn't alter the factor levels. – joran May 19 '13 at 15:19
  • 3
    I think you should not have used `na.strings="NA"` since that is the default setting for `na,strings` and it didn't get you success. Instead you should have used `na.strings = "*"` or perhaps `na.strings = c("NA", "*")` – IRTFM May 19 '13 at 16:40
  • @DWin, oh yes. I did but put it wrongly here. I'm going to edit my answer now. Thanks. – ilhan May 19 '13 at 18:34
  • following up on @joran's comment; the result of `sum(is.na(mydata$PRICE))`, or `table(is.na(mydata$PRICE))`, or `table(mydata$PRICE,useNA="always")` might be enlightening – Ben Bolker May 19 '13 at 18:44
1

It's not mydata that would be equal to "*" but rather mydata$PRICE

Try one of these , the first of which would coerce to a numeric vector and in the process generate a warning about some values being set to NA which can be ignored, since that was what you wanted in the first place:

 mydata$PRICE  <- as.numeric(as.character( mydata$PRICE))

 mydata$PRICE[ mydata$PRICE == "*" ] <- NA

 is.na(my mydata$PRICE) <-  mydata$PRICE == "*"
IRTFM
  • 258,963
  • 21
  • 364
  • 487