cumulative sum by ID with lag

Question

I want to create a cumulative sum by id. But, it should not sum the value that belongs to the row where is being calculated.

I've already tried with cumsum. However, I do not know how to add a statement which specifies to do not add the amount of the row where the sum is made. The result column I am looking for is the third column called: "sum".

For example, for id 1, the first row is sum=0, because should not add this row. But, for id 1 and row 2 sum=100 because the amount of id 1 previous to the row 2 was 100 and so on.

   id amount sum
1:  1    100   0
2:  1     20 100
3:  1    150 120
4:  2     60   0
5:  2    100  60
6:  1     30 270
7:  2     40 160

This is what I've tried:

  df[,sum:=cumsum(amount),
      by ="id"] 

  data: df <- data.table(id = c(1, 1, 1, 2, 2,1,2), amount = c(100, 20, 
  150,60,100,30,40),sum=c(0,100,120,0,60,270,160) ,stringsAsFactors = 
  FALSE)

score 6 · Answer 1 · answered May 27 '19 at 00:52

You can do this without using lag:

> df %>%
      group_by(id) %>%
      mutate(sum = cumsum(amount) - amount)
# A tibble: 7 x 3
# Groups:   id [2]
     id amount   sum
  <dbl>  <dbl> <dbl>
#1    1    100     0
#2    1     20   100
#3    1    150   120
#4    2     60     0
#5    2    100    60
#6    1     30   270
#7    2     40   160

Shree · Accepted Answer · 2019-05-27T01:20:43.473

With dplyr -

df %>% 
  group_by(id) %>% 
  mutate(sum = lag(cumsum(amount), default = 0)) %>% 
  ungroup()

# A tibble: 7 x 3
     id amount   sum
  <dbl>  <dbl> <dbl>
1     1    100     0
2     1     20   100
3     1    150   120
4     2     60     0
5     2    100    60
6     1     30   270
7     2     40   160

Thanks to @thelatemail here's the data.table version -

df[, sum := cumsum(shift(amount, fill=0)), by=id]

akrun · Answer 3 · 2019-05-27T01:57:15.817

1

Here is an option in base R

df$Sum <- with(df, ave(amount, id, FUN = cumsum) - amount)
df$Sum
#[1]   0 100 120   0  60 270 160

Or by removing the last observation, take the cumsum

with(df, ave(amount, id, FUN  = function(x) c(0, cumsum(x[-length(x)]))))

edited May 27 '19 at 01:57

answered May 27 '19 at 01:46

akrun

874,273
37
540
662

score 0 · Answer 4 · answered May 27 '19 at 00:30

You can shift the values you're summing by using the lag function.

library(tidyverse)

df <- data.frame(id = c(1, 1, 1, 2, 2,1,2), amount = c(100, 20, 
150,60,100,30,40),sum=c(0,100,120,0,60,270,160) ,stringsAsFactors = 
FALSE)

df %>% 
    group_by(id) %>% 
    mutate(sum = cumsum(lag(amount, 1, default=0)))

# A tibble: 7 x 3
# Groups:   id [2]
     id amount   sum
  <dbl>  <dbl> <dbl>
1     1    100     0
2     1     20   100
3     1    150   120
4     2     60     0
5     2    100    60
6     1     30   270
7     2     40   160

cumulative sum by ID with lag

4 Answers4

Related