I have a large data.table object in R with 4,847,143 rows. Speed is key so I have mostly implemented operations using library(data.table).
The dt has a structure as follows:
library(data.table)
dt
nr group count
1: 1 A 2
2: 1 B 2
3: 2 C 2
4: 2 D 2
5: 2 A 2
6: 3 B 2
When I try and convert this long dt to a wide format using dcast I get the following error:
ndt <- dcast(dt, nr ~ group, fun.aggregate = sum, value.var = 'count')
Error in dim.data.table(x) :
long vectors not supported yet: ../../src/include/Rinlinedfuns.h:138
In addition: Warning message:
In setattr(l, "row.names", .set_row_names(length(l[[1L]]))) :
NAs introduced by coercion to integer range
When I apply the same function to a subset of the first 2,000,000 rows it works fine:
ndt <- dcast(dt[1:2000000], nr ~ group, fun.aggregate = sum, value.var = 'count')
dim(dt)
[1] 4847143 3
dim(ndt)
[1] 1166035 716
Any help would greatly be appreciated in resolving this or an alternative fast solution.
My data.table version:
> packageVersion('data.table')
[1] ‘1.10.4.3’
Thanks