9

I don't know whether this is an integer64 (from bit64) problem, or a melt problem (from reshape2, but if I try to reshape a data.frame containing integer64 data then the class information is destroyed in the process and it reverts to the double representation:

library(bit64)
library(reshape2)

DF = data.frame(I =letters, Num1 = as.integer64(1:26), Num2 = as.integer64(1:26))
DFM = melt(DF, id.vars = "I")

sapply(DF, class)
sapply(DFM, class)

gives:

> sapply(DF, class)
          I        Num1        Num2 
   "factor" "integer64" "integer64" 
> sapply(DFM, class)
        I  variable     value 
 "factor"  "factor" "numeric" 

And because integer64 is double underneath, the data is "corrupted"

> DF
   I Num1 Num2
1  a    1    1
2  b    2    2
3  c    3    3
4  d    4    4
5  e    5    5
...
> DFM
   I variable         value
1  a     Num1 4.940656e-324
2  b     Num1 9.881313e-324
3  c     Num1 1.482197e-323
4  d     Num1 1.976263e-323
5  e     Num1 2.470328e-323
6  f     Num1 2.964394e-323

What is causing this? Is this a integer64 problem or a melt problem? When creating classes what can be done to avoid this sort of thing?

Corvus
  • 7,548
  • 9
  • 42
  • 68
  • I can't reproduce your problem: sapply(DFM, class) gives "factor", "factor" and "integer64" – Jan van der Laan Feb 15 '13 at 10:54
  • I am able to reproduce it. – Arun Feb 15 '13 at 10:59
  • Interesting, so what could be different between us? Is there some version info or something else I could usefully give? – Corvus Feb 15 '13 at 11:03
  • My fault, I accidentally loaded reshape and not reshape2. Interestingly, with reshape there is no problem. – Jan van der Laan Feb 15 '13 at 11:04
  • 4
    @Corone, Look at [**page 9 here**](http://cran.r-project.org/web/packages/bit64/bit64.pdf). The documentation states the limitations and they clearly state some issues with base R functions. For example, `is.vector(x=as.integer64(1:5))` would return `FALSE`! – Arun Feb 15 '13 at 11:19
  • @Arun, well spotted, it even talks about `unlist`. – juba Feb 15 '13 at 11:21
  • @Arun - I'll accepted that as an answer if you repost. – Corvus Feb 15 '13 at 11:24
  • 1
    @Arun `is.vector` is a red herring: it's false for any vector with attributes. `is.atomic` is the more important test. – hadley Feb 15 '13 at 13:08
  • @hadley, Thanks for correcting. You're right. What I wanted to say was from the documentation that `is.vector` doesn't automtically dispatch `is.vector.integer64` which would return `TRUE`. – Arun Feb 15 '13 at 13:17

4 Answers4

5

It seems to be a limitation of the package which is also mentioned in their documentation here on page 9. For example:

x <- data.frame(a=as.integer64(1:5), b=as.integer64(1:5))
> x
#   a b
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5

> unlist(x)

#            a1            a2            a3            a4            a5            b1 
# 4.940656e-324 9.881313e-324 1.482197e-323 1.976263e-323 2.470328e-323 4.940656e-324 
#            b2            b3            b4            b5 
# 9.881313e-324 1.482197e-323 1.976263e-323 2.470328e-323 

> as.matrix(x)
#                  a             b
# [1,] 4.940656e-324 4.940656e-324
# [2,] 9.881313e-324 9.881313e-324
# [3,] 1.482197e-323 1.482197e-323
# [4,] 1.976263e-323 1.976263e-323
# [5,] 2.470328e-323 2.470328e-323

x <- as.integer64(1:5)

> is.vector(x)
# [1] FALSE

> as.vector(x)
# [1] 4.940656e-324 9.881313e-324 1.482197e-323 1.976263e-323 2.470328e-323
Arun
  • 116,683
  • 26
  • 284
  • 387
5

Resetting the class seems to 'correct' the results, see below. However, as was mentioned in the discussions this will most likely not work if the numeric values also contain other types than integer64.

> class(DFM$value) <- "integer64"
> DFM
   I variable value
1  a     Num1     1
2  b     Num1     2
3  c     Num1     3
Jan van der Laan
  • 8,005
  • 1
  • 20
  • 35
3

I can reproduce it too.

Not a solution, but the problem seems to happen at the following line of melt.data.frame function :

value <- unlist(unname(data[var$measure]))

In your example, this leads to :

unlist(unname(DF[c("Num1","Num2")]))

And the unlist call changes the class of the data. As the help page says :

 The output type is determined from the highest type of the
 components in the hierarchy NULL < raw < logical < integer < real
 < complex < character < list < expression, after coercion of
 pairlists to lists.
juba
  • 47,631
  • 14
  • 113
  • 118
  • 1
    So are we saying bit64 should have implemented unlist.integer64? – Corvus Feb 15 '13 at 11:06
  • @Corone No, I don't think so. `unlist` converts its arguments because it has do deal with cases where they are not of the same class. For example `unlist(list(1,"a",TRUE))`. I don't know what the solution would be. Maybe, in melt, check if all the measure variables are of the same class, and in this case don't call `unlist`. – juba Feb 15 '13 at 11:09
  • reshape uses rbind to put the measure variables underneath each other. rbind.integer64 is implemented in bit64. But even there you might run into the case where there are more variable types. Eventually everything needs to be converted to one type, which probably will always be numeric which is most general. – Jan van der Laan Feb 15 '13 at 11:10
  • @juba one could just as easily request that unlist works on integer64 though - so "don't use unlist" isn't really a fix – Corvus Feb 15 '13 at 11:14
  • @JanvanderLaan but in this case the ONLY type in the unlist is integer64, so when unlist casts to the "highest" it should still be integer64? – Corvus Feb 15 '13 at 11:15
  • @Corone, yes of course it would be better if `unlist` worked on integer64. But as it is not a base R class, it's quite logical it doesn't, and as `unlist` doesn't seem to be implemented with S3 methods, I don't know how this could be done. – juba Feb 15 '13 at 11:19
  • @juba and Corone, check the documentation I've cited as comment to the question. – Arun Feb 15 '13 at 11:20
  • In the short run, might be simpler to do `foo<-as.numeric(my_integer64_data)` and convert back when you're done. – Carl Witthoft Feb 15 '13 at 12:22
0

My integer64 problem arose because of getting data from postgres R library by using a query.

I solved this problem by writing a csv and then read it.

#integer64 issue!
write.csv(df, 'df.csv')
library(readr)
df <- read_csv("df.csv")

All numerical variables converted from integer64 to double or numericals.

jim andr
  • 21
  • 2