5

I was reviewing some code and came across this odd result. If you have a dataframe with one value of type integer and you coerce it to integer you get what I think you would expect:

library(dplyr)

tibble(x = as.integer(c(1))) %>% as.integer()

[1] 1

But if it's of type int64, you get something weird:

library(bit64)

tibble(x = as.integer64(c(1))) %>% as.integer()

[1] 0

What gives? I assume it has something to do with the int64 class. But why would I get zero? Is this just bad error handling?

Update

OK, there's a hint to what's going on when you call dput on the int64 dataframe:

structure(list(x = structure(4.94065645841247e-324, 
                             class = "integer64")), 
          row.names = c(NA, -1L), 
          class = c("tbl_df", "tbl", "data.frame"))

So as.integer() is rightly converting 4.94065645841247e-324 to zero. But why is that what's stored in the DF?

Also, to see that this is not a bit64 issue, I get a very similar structure on the actual df I get back from my database:

structure(list(max = structure(2.78554211125295e-320,
                               class = "integer64")),
          class = "data.frame", 
          row.names = c(NA, -1L))
Ben G
  • 4,148
  • 2
  • 22
  • 42
  • That `double` is how the integer is being represented in the R object storing the data, like I mentioned below. The class just help informs r which method to use when you want to convert. In fact the DBI documentation mentions `bit64` specifically – SmokeyShakers Dec 15 '21 at 19:56
  • ok, so is `int64` an option in base r? – Ben G Dec 15 '21 at 21:20
  • No, they’re just using the precision of a double to store more information, allowing for bigger numbers. – SmokeyShakers Dec 15 '21 at 22:30

1 Answers1

1

I think this is a limitation of bit64. bit64 uses the S3 Method as.integer.integer64 to convert from int64 to int, but only for vectors (unlike base as.integer which can be applied to other objects). The base as.integer doesn't know how to convert int64 to int on a data.frame or otherwise.

So after loading bit64, as.integer will call actually as.integer.integer64 on all int64 vectors, but not on a data.frame or tibble.

SmokeyShakers
  • 3,372
  • 1
  • 7
  • 18
  • There's quite a lot of strange behavior. Look at output of `matrix(as.integer64(x))` for any value of `x`, it's always approximately 0. Considering the documentation mentions these should be usable within vectors, matrices, etc. seems quite strange. I imagine this behavior must be related. – caldwellst Dec 15 '21 at 15:52
  • Agreed. I just tried: `as.vector(as.integer64(1), mode = 'integer')`, which I think is roughly how base as.integer would handle a data.frame. It returns 0 – SmokeyShakers Dec 15 '21 at 15:53
  • 1
    Looking at the docs, it seems from base R's pov, integer64s are just doubles. as.integer is just converting the integer64's representation as a double in to integer, but that representation is not the actual value. – SmokeyShakers Dec 15 '21 at 16:26
  • Interesting! Good find! – caldwellst Dec 15 '21 at 16:27
  • So I just used `bit64` to make things easier here, but the actual example came out of database without the use of `bit64` I'll update the post. – Ben G Dec 15 '21 at 16:56
  • I think you're up against the same issue, if you are using `DBI` for example. `as.integer` should work on column mutate since the `.integer64` method will me called, but it would have no way to work on a `data.frame`. – SmokeyShakers Dec 15 '21 at 17:40
  • Check out the update above--this is going on in the structure of the object. – Ben G Dec 15 '21 at 19:44
  • So, bottom line: should this be considered a "bug" in `bit64`? You definitely shouldn't get 0 on a dataframe that's int64 when you call `as.integer`. It should either give you the expected result or throw an error. – Ben G Dec 23 '21 at 03:18