0

When I use read_excel to import data from Excel to R, some numeric columns are automatically converted to dates.

# e.g.
5600 to 1915-05-01

Is there a way to turn of this feature? Other than using "col_types" argument in read_excel.

Len Greski
  • 10,505
  • 2
  • 22
  • 33
LeGeniusII
  • 900
  • 1
  • 10
  • 28
  • 3
    col_types is it mate – Dom Apr 13 '18 at 07:23
  • 3
    what's your aversion to using an argument explicitly designed for the purpose you're after? I must be missing something – MichaelChirico Apr 13 '18 at 07:35
  • 3
    The question is, what is in your data that makes read_excel think it's a date? If it were only numbers, this probably wouldn't happen – talat Apr 13 '18 at 07:47
  • @MichaelChirico I'm uploading multiple files (30+ excels) using a for loop, and they can have different structures (e.g. column names, number of columns). Therefore, defining col_types one-by-one is possible but not quite efficient. – LeGeniusII Apr 15 '18 at 23:15
  • @docendodiscimus That's a really good point. I've checked the data on Excel before posting this question, and the column only contains 1 to 5 digits whole numbers. – LeGeniusII Apr 15 '18 at 23:26
  • understood, and it's a valid concern. I agree with @docendodiscimus in this case -- it's worth exploring what's led read_excel astray – MichaelChirico Apr 16 '18 at 01:33
  • @MichaelChirico - what's led `read_excel()` astray is the fact that date times are a special form of numeric, so it's not easy for the type guesser to distinguish between a number and a number that's really a date. – Len Greski Apr 16 '18 at 02:31
  • @LenGreski certainly the vast majority of data sets being used today are not using data from 1915... I know nothing will be perfect for this case, but improvements are always something to look out for. – MichaelChirico Apr 16 '18 at 04:10

1 Answers1

2

The readxl package, like readr for raw data files, has a type guesser to determine how to read columns in an Excel spreadsheet. As noted in the package vignette, the guessing process is not perfect, especially as it relates to date formats because they are stored as a special type of number.

enter image description here

As stated in the package documentation (as well as the comments on the OP), the way to avoid inaccurate guesses from the column type guesser is to explicitly specify the column types with the col_types argument on read_excel().

Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • 1
    if the type guesser is failing, probably the algorithm can be improved. I encourage OP to file an issue on GitHub and share their data if possible – MichaelChirico Apr 16 '18 at 01:35