2

I want to sum up numbers by blocks:

Here is a sample data

 data=matrix(c(0,0,0,1,1,0,1,1,1,1,1,0,0,1,0,0,1.2,2.3,1.3,1.5,2.5,2.1,2.3,1.2),
             ncol=3,dimnames=list(c(),c("low","high","time")))

     low high time
 [1,]   0    1  1.2
 [2,]   0    1  2.3
 [3,]   0    1  1.3
 [4,]   1    0  1.5
 [5,]   1    0  2.5
 [6,]   0    1  2.1
 [7,]   1    0  2.3
 [8,]   1    0  1.2

I want to get

       n  sum
 [1,]  3  4.8
 [2,]  2  4
 [3,]  1  2.1
 [4,]  2  3.5

without using any package. How to do that with R?

Or if I can get

       n/low n/high sum
 [1,]  0       3    4.8
 [2,]  2       0    4
 [3,]  0       1    2.1
 [4,]  2       0    3.5
Yukun
  • 315
  • 4
  • 15
  • 2
    Please state what you have tried so far. – Andru Mar 08 '16 at 21:47
  • I have tried by(), aggregate(), etc.. But I didn't find a good way to apply these functions for this question. So basically I totally have no idea – Yukun Mar 08 '16 at 21:54
  • Do you currently have a matrix as in the example or a data frame? They look the similar but there's a difference – Pierre L Mar 08 '16 at 21:58
  • The data I have is a data frame actually, I just generate a matrix for an example. – Yukun Mar 08 '16 at 22:02

4 Answers4

9

Not sure why the constraint on packages. They can make this much easier. We can create an index by using the unique combinations of the first two columns. Then aggregate with the index for grouping. Add a line for setting the names up and data frame structure:

ind <- with(rle(do.call(paste, df1[1:2])), rep(1:length(values), lengths))
a <- aggregate(df1$time, list(ind), function(x) c(length(x), sum(x)))[-1]
setNames(do.call(data.frame, a), c("n", "sum"))

  n sum
1 3 4.8
2 2 4.0
3 1 2.1
4 2 3.5

To illustrate how simple it is with help from data.table:

library(data.table)
setDT(df1)[, .(.N, sum(time)), by=rleid(low, high)]

Update

For follow-up question, see @bgoldst answer in comments.

Pierre L
  • 28,203
  • 6
  • 47
  • 69
  • You can use `base R` or `data.table`. With `dplyr` you can either combine with `data.table` or use an indexing method from http://stackoverflow.com/questions/33507868/is-there-a-dplyr-equivalent-to-data-tablerleid – Pierre L Mar 08 '16 at 22:14
  • That's where you input the name of your data frame – Pierre L Mar 08 '16 at 22:17
  • 1
    Could, also, use `r = rle(data[, "low"]); rowsum(data, rep(seq_along(r$lengths), r$lengths))` – alexis_laz Mar 09 '16 at 10:32
  • You probably want to indicate whether the rleid group is for low or high, maybe `as.data.table(data)[, .(c("low","high")[low[1]+1], sum(time)), by=rleid(low,high)]` – Frank Mar 09 '16 at 15:18
  • Thanks @Frank an identifier column could help OP – Pierre L Mar 09 '16 at 15:26
3

A similar option, also using aggregate;

aggregate(cbind(n=1,sum=df$time), 
          by=list(c(0, cumsum(abs(diff(df$low))))), 
          FUN=sum)[-1]
Joachim Isaksson
  • 176,943
  • 25
  • 281
  • 294
0

I have solved the problem, I think that is a little bit complicated but it works¡¡.

Well, I have generated every column using loops.

1) I have count every change

 data<-data.frame(data)
 ind1<-vector(mode="numeric", length=0)
 ind1[1]<-1
 for(i in c(2:8))
   ind[i]<-ifelse(data[i,1:2]==data[i-1,1:2],ind1[i-1],ind1[i-1]+1)

Then I have generated the sum with loops also.

ind<-c(1.2,0,0,0)
k<-1

for(i in c(2:8)){
  if(data[i,1:2]==data[i-1,1:2]){
     ind2[k]<-ind2[k]+data[i,3]
  }else{
      k<-k+1
      ind2[k]<-ind2[k]+data[i,3]
}}


  result<-cbind(data.frame(table(ind1))$Freq,ind2)

However I have gotten some warnings, but I think that is not a problem.

sanmath
  • 1
  • 2
  • Just a suggestion, yourcodewouldbemorereadablewithsomespaces. – Gregor Thomas Mar 09 '16 at 22:54
  • what kind of spaces? – sanmath Mar 10 '16 at 15:15
  • White spaces. Common style guides for R (e.g., [Google's](https://google.github.io/styleguide/Rguide.xml) or [Hadley Wickham's](http://adv-r.had.co.nz/Style.html)) recommend using a space after every comma, and around all binary operators, especially the assignment operator `<-`. Instead of `if(data[i,1:2]==data[i-1,1:2]){`, you would have `if (data[i, 1:2] == data[i - 1, 1:2]) {`. It makes it easier to read and quicker to understand. – Gregor Thomas Mar 10 '16 at 18:00
0

I also find a similar option:

 aggregate(df,list(c(0,cumsum(abs(diff(df$low))))),sum)[-1]

For me it is more straightforward to understand.

Yukun
  • 315
  • 4
  • 15