0

For example, using column 1 as the matching criterion, lets call replicate(length(v), sum(v)) for the column 2 vector, v, of every set of rows that consists of contiguous and matching rows from the data frame A (including sets of size 1).

A v   
a 12 
a 43
b 8 
a 4
b 12
c 5
c 9
d 21

-> 
55, 55, 8, 4, 12, 14, 14, 21

The operation can return a vector or a list of vectors that we can coerce to a vector with unlist().

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Matt Munson
  • 2,903
  • 5
  • 33
  • 52

2 Answers2

3

Here's a simple solution using data.table - simply because of it's built in rleid function and because it handles factors seemingly

library(data.table)
setDT(df)[, res := sum(v), by = rleid(A)]
df
#    A  v res
# 1: a 12  55
# 2: a 43  55
# 3: b  8   8
# 4: a  4   4
# 5: b 12  12
# 6: c  5  14
# 7: c  9  14
# 8: d 21  21

If we want base R we could either recreate rleid or just combine cumsum with ave

with(df, ave(v, cumsum(c(TRUE, head(A, -1) != tail(A, -1))), FUN = sum))
# [1] 55 55  8  4 12 14 14 21
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • I've avoided trying data.table so far. It seems to come up a fair bit in the places I look--maybe I should give it a chance. I will wait a bit to see if someone posts a non data.table solution. – Matt Munson May 01 '16 at 10:24
  • So what are you using? `dplyr`? Base R? Also, is `A` a `factor` or a `character` class? – David Arenburg May 01 '16 at 10:25
  • I use `dplyr` and base R so far. My concern about data.table is that it changes the data structure so that it won't work using base R syntax. Or is that a misconception? `A` is character type data. I suppose it could easily be coerced to factors (don't normally use factors)? – Matt Munson May 01 '16 at 10:29
  • Ok, added base R. Will add `dplyr` in a moment – David Arenburg May 01 '16 at 10:29
  • TBH, I've tried this with `lead`/`lag` from dplyr but it keeps returning lots of weird errors. In your case you could just combine `rleid` from `data.table` with `group_by` and `mutate` from dplyr if you like. Or you could write your own `rleid` function using something like `rleid <- function(x){ r <- rle(x); rep(1:length(r$lengths), r$lengths)}` – David Arenburg May 01 '16 at 10:54
  • Thanks for the two solutions. That's a clever use of `head` and `tail` to get the offset. I hadn't thought of that before. I may go ahead and steal rleid from data.table. Even better, I can just use your function, thanks! – Matt Munson May 01 '16 at 11:00
  • Yeah, this was already [discussed here once](http://stackoverflow.com/questions/33507868/is-there-a-dplyr-equivalent-to-data-tablerleid) – David Arenburg May 01 '16 at 11:07
2

Here is an option using dplyr

 library(dplyr)
 df1 %>%
    group_by(A1 = cumsum(A!= dplyr::lag(A, default=A[1]))) %>% 
    mutate(res = sum(v)) %>%
    ungroup() %>% 
    select(-A1)
#     A     v   res
#  (chr) (int) (int)
#1     a    12    55
#2     a    43    55
#3     b     8     8
#4     a     4     4
#5     b    12    12
#6     c     5    14
#7     c     9    14
#8     d    21    21
akrun
  • 874,273
  • 37
  • 540
  • 662
  • hmmm... So in the newer versions you now need to specify `dplyr::lag` explicitly it seems... I was wondering why I keep getting errors... – David Arenburg May 01 '16 at 13:00
  • @DavidArenburg It only have problems when you use `default`. I think otherwise it was calling the base R lag – akrun May 01 '16 at 13:01