2

I am using xlsx library in R to read a excel sheet. I used the following command. My data are numeric/floats with NA for missing values and first column as name (string/character type). However, all the column are of type character and I could not find if I can somehow specify NA values as missing values. Any suggestions on how to deal with the issue?

df=read.xlsx(file0, sheetName = 'sheet1', as.data.frame = TRUE, 
             header = TRUE, use.value.labels=FALSE, stringsAsFactors=FALSE)
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
user1430763
  • 37
  • 1
  • 8
  • 1
    Would it be acceptable to remove NAs as a second step?: `df[df=='NA'] <- NA` followed by: `df <- sapply(df, as.numeric)` – Andrew Taylor Jan 29 '16 at 14:30
  • 4
    you may consider using `read_excel` from `readxl` which has an option to specify `NA`. – MichaelChirico Jan 29 '16 at 14:36
  • My first thought was 'but doesn't `read.xlsx` have an NA option?', but didn't see it in the documentation. Of course it was `read_excel`. Not mad Hadley et al. came and fixed things, but it does make it hard to keep things straight. – Andrew Taylor Jan 29 '16 at 14:39

1 Answers1

1

You can also try

df[]=lapply(df,type.convert,as.is=TRUE)

type.convert will attempt to find the appropriate class of each column and convert accordingly. Without the option as.is=TRUE it will convert the character columns to factors. It also handles NA strings. The default option na.strings="NA" should be ok for you.

cryo111
  • 4,444
  • 1
  • 15
  • 37