sum up numbers by blocks in R

Question

I want to sum up numbers by blocks:

Here is a sample data

 data=matrix(c(0,0,0,1,1,0,1,1,1,1,1,0,0,1,0,0,1.2,2.3,1.3,1.5,2.5,2.1,2.3,1.2),
             ncol=3,dimnames=list(c(),c("low","high","time")))

     low high time
 [1,]   0    1  1.2
 [2,]   0    1  2.3
 [3,]   0    1  1.3
 [4,]   1    0  1.5
 [5,]   1    0  2.5
 [6,]   0    1  2.1
 [7,]   1    0  2.3
 [8,]   1    0  1.2

I want to get

       n  sum
 [1,]  3  4.8
 [2,]  2  4
 [3,]  1  2.1
 [4,]  2  3.5

without using any package. How to do that with R?

Or if I can get

       n/low n/high sum
 [1,]  0       3    4.8
 [2,]  2       0    4
 [3,]  0       1    2.1
 [4,]  2       0    3.5

I have tried by(), aggregate(), etc.. But I didn't find a good way to apply these functions for this question. So basically I totally have no idea — Yukun, Mar 08 '16 at 21:54
Do you currently have a matrix as in the example or a data frame? They look the similar but there's a difference — Pierre L, Mar 08 '16 at 21:58
The data I have is a data frame actually, I just generate a matrix for an example. — Yukun, Mar 08 '16 at 22:02

Pierre L · Accepted Answer · 2016-03-08T23:29:54.597

9

Not sure why the constraint on packages. They can make this much easier. We can create an index by using the unique combinations of the first two columns. Then aggregate with the index for grouping. Add a line for setting the names up and data frame structure:

ind <- with(rle(do.call(paste, df1[1:2])), rep(1:length(values), lengths))
a <- aggregate(df1$time, list(ind), function(x) c(length(x), sum(x)))[-1]
setNames(do.call(data.frame, a), c("n", "sum"))

  n sum
1 3 4.8
2 2 4.0
3 1 2.1
4 2 3.5

To illustrate how simple it is with help from data.table:

library(data.table)
setDT(df1)[, .(.N, sum(time)), by=rleid(low, high)]

Update

For follow-up question, see @bgoldst answer in comments.

edited Mar 08 '16 at 23:29

answered Mar 08 '16 at 22:09

Pierre L

28,203
6
47
69

You can use `base R` or `data.table`. With `dplyr` you can either combine with `data.table` or use an indexing method from http://stackoverflow.com/questions/33507868/is-there-a-dplyr-equivalent-to-data-tablerleid – Pierre L Mar 08 '16 at 22:14
That's where you input the name of your data frame – Pierre L Mar 08 '16 at 22:17
1

Could, also, use `r = rle(data[, "low"]); rowsum(data, rep(seq_along(r$lengths), r$lengths))` – alexis_laz Mar 09 '16 at 10:32
You probably want to indicate whether the rleid group is for low or high, maybe `as.data.table(data)[, .(c("low","high")[low[1]+1], sum(time)), by=rleid(low,high)]` – Frank Mar 09 '16 at 15:18
Thanks @Frank an identifier column could help OP – Pierre L Mar 09 '16 at 15:26

score 3 · Answer 2 · answered Mar 08 '16 at 22:19

3

A similar option, also using aggregate;

aggregate(cbind(n=1,sum=df$time), 
          by=list(c(0, cumsum(abs(diff(df$low))))), 
          FUN=sum)[-1]

answered Mar 08 '16 at 22:19

Joachim Isaksson

176,943
25
281
294

1

Nice solution. Could also use the formula approach: `aggregate(.~g,cbind(df,g=c(0,cumsum(abs(diff(df$low))))),sum)[-1]`. – bgoldst Mar 08 '16 at 22:25
Thanks a lot! These solutions open up my eyes! – Yukun Mar 08 '16 at 22:34

score 0 · Answer 3 · answered Mar 09 '16 at 14:28

0

I have solved the problem, I think that is a little bit complicated but it works¡¡.

Well, I have generated every column using loops.

1) I have count every change

 data<-data.frame(data)
 ind1<-vector(mode="numeric", length=0)
 ind1[1]<-1
 for(i in c(2:8))
   ind[i]<-ifelse(data[i,1:2]==data[i-1,1:2],ind1[i-1],ind1[i-1]+1)

Then I have generated the sum with loops also.

ind<-c(1.2,0,0,0)
k<-1

for(i in c(2:8)){
  if(data[i,1:2]==data[i-1,1:2]){
     ind2[k]<-ind2[k]+data[i,3]
  }else{
      k<-k+1
      ind2[k]<-ind2[k]+data[i,3]
}}


  result<-cbind(data.frame(table(ind1))$Freq,ind2)

However I have gotten some warnings, but I think that is not a problem.

answered Mar 09 '16 at 14:28

sanmath

1
2

Just a suggestion, yourcodewouldbemorereadablewithsomespaces. – Gregor Thomas Mar 09 '16 at 22:54
what kind of spaces? – sanmath Mar 10 '16 at 15:15
White spaces. Common style guides for R (e.g., [Google's](https://google.github.io/styleguide/Rguide.xml) or [Hadley Wickham's](http://adv-r.had.co.nz/Style.html)) recommend using a space after every comma, and around all binary operators, especially the assignment operator `<-`. Instead of `if(data[i,1:2]==data[i-1,1:2]){`, you would have `if (data[i, 1:2] == data[i - 1, 1:2]) {`. It makes it easier to read and quicker to understand. – Gregor Thomas Mar 10 '16 at 18:00

score 0 · Answer 4 · answered Mar 09 '16 at 21:13

0

I also find a similar option:

 aggregate(df,list(c(0,cumsum(abs(diff(df$low))))),sum)[-1]

For me it is more straightforward to understand.

answered Mar 09 '16 at 21:13

Yukun

315
4
15

sum up numbers by blocks in R

4 Answers4