I have a large (23 Mln rows) ffdf table (tbl_ffdf) with 10 columns, 7 of them are factors and 3 contain numbers. It looks something like this:
TABLE_bad
F1 F2 F3 F4 F5 F6 F7 N1 N2 N3
1111 01.15 05.14 busns AA 16 F 55.2 16165 0
1111 01.15 05.14 busns AA 16 F 12.5 0 4545
2222 12.14 11.14 privt KM 5 T 0.7 255 987777
2222 12.14 11.14 privt KM 5 T 111.6 7800 0
I'd like to aggregate the data with sum(Nx) to remove this kind of duplicates and make my table look like this:
TABLE_ok
F1 F2 F3 F4 F5 F6 F7 N1 N2 N3
1111 01.15 05.14 busns AA 16 F 57.7 16165 4545
2222 12.14 11.14 privt KM 5 T 112.3 8055 987777
I'm using package ffbase2 installed from github (which is dplyr for ffdf tables). I'm doing following:
TABLE_gr <- group_by(TABLE_bad, F1, F2, F3, F4, F5, F6, F7) # this step finishes OK
# in approximately 90 sec
TABLE_ok <- summarise(TABLE_gr, sN1 = sum(N1), sN2 = sum(N2), sN3 = sum(N3))
and after that it works ~ 10 sec and says
Error in as.vmode.default(value, vmode) :
(list) object cannot be coerced to type 'double'
after that it goes in debug mode accordingly to the settings in my Rstudio, and it takes him ~ 3-5 MINUTES to go deep enough, stop hanging computer and show code of fuction which made error:
function (x, ...)
UseMethod("as.vmode")
Here in Data we can see that x is data.frame of F1 values. And in Traceback - functions
eval(expr, envir, enclose)
`[<-`(`*tmp*`, ff::hi(N + 1, N + n), , value = -*etc*-
append_to(out, res, -*etc*-
summarise_.grouped_ffdf( -*etc*-
Watching into source code of ffbase2 gave me not much... I've got something like method summarise_.grouped_ffdf uses recursive slicing of data and, probably, on last step it gets some data.frame but wanted to get a matrix?.. it's a usual reason of "(list) object cannot be coerced to type 'double'" error.
I have no idea what is the real reason of this error and how to fix it. Help please! :-)