3

I have an Excel file with all columns of type "text". When read_excel is called, however, some of the columns are guessed to be "dbl" instead. I know I can use col_types to specify the columns, but that requires me knowing how many columns there are in my file.

Is there a way I can detect the number of columns? Or, alternatively, specify that the columns are all "text"? Something like

read_excel("file.xlsx", col_types = "text")

which, quite reasonably, gives an error that I haven't specified the type for all the columns.

Currently, I can solve this by reading in the file twice:

read_excel_one_type <- function(filename, col_type = "text"){
  temp <- read_excel(path = filename)
  ncol.temp <- ncol(temp)
  read_excel(path = filename, col_types = rep(col_type, ncol.temp))
}

but a method that doesn't require reading the file twice would be better.

Hugh
  • 15,521
  • 12
  • 57
  • 100
  • One way would be to read only the first line with `skip` argument – akrun Dec 18 '15 at 02:31
  • I thought `skip` is how many rows to skip before reading data, rather than how many rows to read before skipping? – Hugh Dec 18 '15 at 02:39

2 Answers2

3

This answer seems to be helpful: https://stackoverflow.com/a/34015430/5220858. I have found that the excel file needs to be formatted correctly from the start in order for R to automatically detect the correct data type (i.e. numeric, date, text). I think the post though is more relevant to your question. The poster shows a bit of code similar to what you have provided, except only one line of data is read to determine the number of columns, then the rest is read into R based on the first line.

Community
  • 1
  • 1
-3
library("xlsx")
file<-"myfile.xlsx"
sheetIndex<-1
mydf<-read.xlsx(file, sheetIndex, sheetName=NULL, rowIndex=NULL,
      startRow=NULL, endRow=NULL, colIndex=NULL,
      as.data.frame=TRUE, header=TRUE, colClasses=NA,
      keepFormulas=FALSE, encoding="unknown")

works for me

Frank
  • 190
  • 1
  • 10