I have a 72 million observation data frame. It has two columns, my_id
and my_rand
variables.
The data frame has about 6 million unique my_id
. I need to calculate average my_rank
value by my_id
(group by my_id
).
I tried to run the above regular R command, however it seems freeze the R (maybe data too big to fit memory).
avg_rank_by_id<-aggregate(dataframe1["my_rank"],by=dataframe1["my_id"], mean, na.rm=TRUE)
Is there a way to run Revo Scale R such as rxCube
etc. to achieve the goal? I am running on Linux. It tried below, but got error.
I am new to R. Besides Revo Scale R, is there another high performance computing open source R package available? Thanks.
acct_avg_rank <- rxCube( N(m13_rank)~acct_id, data=payee_merge, means=TRUE, returnDataFrame=TRUE)
All independent variables must be factors for rxCube
and rxCrossTabs
: "acct_id"
.
Use F(x)
to declare that a continuous variable x
is to be treated as a factor.
Error in rxCall("RxCrossTabs", params) :
Calls: rxCube -> rxCubeBase -> rxCall -> .Call