I have a time-series panel dataset that is structured in the following way: There are 2 funds that each own different stocks at each time period.
df <- data.frame(
fund_id = c(1,1,1,1,1,1,1,1, 1, 2,2,2,2),
time_Q = c(1,1,1,2,2,2,2,3, 3, 1,1,2,2),
stock_id = c("A", "B", "C", "A", "C", "D", "E", "D", "E", "A", "B", "B", "C")
)
> df
fund_id time_Q stock_id
1 1 1 A
2 1 1 B
3 1 1 C
4 1 2 A
5 1 2 C
6 1 2 D
7 1 2 E
8 1 3 D
9 1 3 E
10 2 1 A
11 2 1 B
12 2 2 B
13 2 2 C
For each fund, I would like to calculate the percentage of stocks held in that current time_Q that were also ever held in any of the previous one to 2 quarters. So basically for every fund and every time_Q, I would like to have 2 columns with past 1 time_Q, past 1-2 time_Q which show what percentage of stocks held in that time were also present in any of that past time_Qs. Here is what the result should look like:
result <- data.frame(
fund_id = c(1,1,1,2,2),
time_Q = c(1,2,3,1,2),
past_1Q = c("NA",0.5,1,"NA",0.5),
past_2Q = c("NA",0.5,1,"NA",0.5)
)
> result
fund_id time_Q past_1Q past_1_2Q
1 1 1 NA NA
2 1 2 0.5 0.5
3 1 3 1 1
4 2 1 NA NA
5 2 2 0.5 0.5
I already asked a similar question here, but now I'm looking for common elements across any of the past lagged periods. I'm looking for a dplyr or data.table scalable solution where I can have around 12 past quarters and deal with multiple funds and stocks and time periods.
Thanks in advance!