So I have two tables:
Table1:
ID Yr Qty Cum_Qty
A 2013 3 3
A 2015 3 6
A 2016 2 8
B 2006 1 1
Table2:
ID Yr
A 2013
A 2014
A 2015
A 2016
A 2017
B 2016
B 2017
C 2016
C 2017
This is what I want to achieve - I want to add Cum_Qty to table 2 and inherit most recent Cum_Qty if one exists:
ID Yr Cum_Qty
A 2013 3
A 2014 3
A 2015 6
A 2016 8
A 2017 8
B 2016 1
B 2017 1
C 2016 0
C 2017 0
If I currently do a left_join(table2, table1, by = c("ID", "Yr")
, I end up with NA values in Cum_Qty if specific year is missing in table 2.
I thought about using lag function but that would leave me with NAs for the first year record and wouldn't work for ID C, which is missing from table1.
I would like to use dplyr::mutate
so my code stays in the pipes.
I think I know how to do this with 3~4 mutates, but I want to find a way to simplify the code. Does anyone have any recommendations on what I can do?
To create tables in R:
table_1 <- data.frame(
ID = c("A", "A", "A", "B"),
Yr = c(2013, 2015, 2016, 2006),
Qty = c(3, 3, 2, 1)) %>%
arrange(ID, Yr) %>%
group_by(ID) %>%
mutate(Cum_Qty = cumsum(Qty))
table_2 <- data.frame(
ID = c("A", "A", "A", "A", "A", "B", "B", "C", "C"),
Yr = c(2013, 2014, 2015, 2016, 2017, 2016, 2017, 2016, 2017))