2

I have an integer 18495608239531729, it is causing me some trouble with data.table package in a sense, that when i read data from csv file which stores numbers this big, it stores them as integer64

Now i would like to filter my data.table like dt[big_integers == 18495608239531729] which gives me a data type mismatch (comparing integer64 and double).

I figured that since 18495608239531729 is really big number, i should perhaps use the bit64 package to handle the data types.

So i did:

library(bit64)
as.integer64(18495608239531729)

> integer64
> [1] 18495608239531728

I thought integer64 should be able to work with much larger values without any issues?

So i did:

as.integer64(18495608239531729) == 18495608239531729

> [1] TRUE

At which point i was happier, but then i figured, why not try:

as.integer64(18495608239531728)

> integer64
> [1] 18495608239531728

Which lead me to trying also:

as.integer64(18495608239531728) == as.integer64(18495608239531729)

> [1] TRUE

What is the right way to handle big numbers in R without the loss of precision? Technically, in my case, the i do not do any mathematical operations with the said column, so i could treat it as character vectors (although i was worried that would take up more memory, and joins in r data.table would be slower?)

ira
  • 2,542
  • 2
  • 22
  • 36

1 Answers1

4

You are passing a floating point number to as.integer64. The loss of precision is already in your input to as.integer64:

is.double(18495608239531729)
#[1] TRUE

sprintf("%20.5f", 18495608239531729)
#[1] "18495608239531728.00000"

Pass a character string to avoid that:

library(bit64)
as.integer64("18495608239531729")
#integer64
#[1] 18495608239531729
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Right! I should have tried `18495608239531729 == 18495608239531728` as well, which would tell me that the issue is already at the step before, because this already returns `TRUE`. Thank you for the clear answer! – ira Jul 30 '20 at 08:59