I have a csv file that's 120GB in size which is a set of numerical values grouped by categorical variables.
eg.
df<-as.data.frame(x=rbing(rep("BLO",100),rep("LR",100)), y=runif(200))
I would like to calculate some summary statistics using group_by(x) but my file doesn't fit into memory. What are my options? I've looked at tidyfst and {disk.frame} but I'm not sure. Any help would be much appreciated.
Thank you.