0

I have some numbers in an excel file that I want to read into R as characters. When I import the file either using readxl or openxlsx, the imported data have two extra characters, which are not in the excel file. The excel sheet looks like this:

Excel file

The example file is here

I have tried changing the format within the Excel file but this messes up the numbers. My current work-around is to concatenate the number with ' in a separate column in excel and then read that column into R. This works for some reason.

library(readxl)
boo <- read_excel("./boo.xlsx", 
                    col_types = c("text"))
boo

Reading the excel file gives the following (note the last two characters in the Example numbers column. The concatNum column shows the concatenated version.

# A tibble: 6 x 2
  `Example numbers`  concatNum        
  <chr>              <chr>            
1 985.12002779568002 '985.12002779568 
2 985.12002826159505 '985.120028261595
3 985.12002780627301 '985.120027806273
4 985.12002780627301 '985.120027806273
5 985.12002780724401 '985.120027807244
6 985.12002780291402 '985.120027802914

Any reasons why this would be happening? Does anyone have a better way of fixing it than my current work-around?

Ricecakes
  • 681
  • 1
  • 5
  • 13
  • 2
    I suspect it's a floating point math thing. See `print(.1+.2, digits=18)` – cory Jun 20 '19 at 19:38
  • 1
    Your character-based solution is probably the best way to preserve arbitrary precision past the 15th significant digit. If your data is distributed like your example, another approach could be to normalize your data by subtracting a constant like `985.12002` and freeing up 8 more digits of precision. – Jon Spring Jun 20 '19 at 19:49
  • Thanks both! Good to know. I think I will stick with the character solution because these numbers are some kind of ID numbers and actually meant to be strings rather actual numbers. – Ricecakes Jun 20 '19 at 20:07
  • @JonSpring, I understand the implications of IEEE-754 (and how it relates to [R FAQ 7.31](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f)), but I don't see how/where R and Excel implement 754 differently. Can you expand a little on that? Just wondering. – r2evans Jun 20 '19 at 20:59
  • 1
    @r2evans, I've deleted my earlier comment that was based on my own misreading of descriptions of how the two programs interpret floating point numbers. It sounds like in theory they should be the same, but something is different. It looks like it might be something like "Excel truncates past the 15th significant digit, whereas R rounds to the 17th significant digit". If you enter `123.4567890123456789` into each, Excel stores / displays it as `123.45678901234500000`, whereas R converts it to `123.45678901234568`. – Jon Spring Jun 20 '19 at 21:24
  • It wasn't a critique, I've come to question my own understanding of IEEE-754 and how to work around problems that are due to misunderstanding its premise (such as: a test of *equality* with floating point numbers is fundamentally flawed, perhaps *inequality* is better suited). I don't know enough about each's specific implementation (flawless or otherwise) to know if they actually differ in practice. Thanks. – r2evans Jun 20 '19 at 21:29

0 Answers0