If speed and brevity is of interest then for completeness (and using a chunk size of 4 rather than 8 to keep the example short) :
require(data.table)
set.seed(0)
DT = data.table(a=rnorm(10))
DT
a
[1,] 1.262954285
[2,] -0.326233361
[3,] 1.329799263
[4,] 1.272429321
[5,] 0.414641434
[6,] -1.539950042
[7,] -0.928567035
[8,] -0.294720447
[9,] -0.005767173
[10,] 2.404653389
DT[,list(sum=sum(a),groupsize=.N),by=list(chunk=(0:(nrow(DT)-1))%/%4)]
chunk sum groupsize
[1,] 0 3.538950 4
[2,] 1 -2.348596 4
[3,] 2 2.398886 2
Admitedly, that's quite a long statement. It names the columns and returns the group size too to show you that the last chunk really does include just 2 rows as required, though.
Once comfortable it's doing the right thing, it can be shortened to this :
DT[,sum(a),by=list(chunk=(0:(nrow(DT)-1))%/%4)]
chunk V1
[1,] 0 3.538950
[2,] 1 -2.348596
[3,] 2 2.398886
Notice that you can do on the fly aggregations like that; they don't need to be added to the data first. If you have a lot of different aggregations in a production script, or just want to interact with the data from the command line, then very small productivity differences like this can sometimes help, depending on your workflow.
NB: I picked sum
but that could be replaced with somefunction(.SD)
or (more likely) just list(exp1,exp2,...)
where each exp
is any R expression that sees column names as variable names.