I have an issue regarding a certain kind of mean() calculation. I use a panel data set with two indentifiers "ID" and "year" (using the plm pkg)
I want to calculate the groupwise mean of a variable "y", but omit the first year's entry of the calculation and then only fill in the calculated mean only in the years that were used to calculate it. In other words, I want to have NA in every ID's first entry of this variable.
The panel data is unbalanced, so people come and go at different points in time. Some stay from beginning till end, for others I just have data for three 3 years.
library(tidyverse)
library(plm)
ID <- c("a","a","a","a","a","b","b","b","b","c","c","c")
y <- c(9,2,5,3,3,9,1,2,3,9,2,5)
year<- c(2001,2002,2003,2004,2005,2001,2002,2003,2004,2002,2003,2004)
dt <- data.frame(ID,y,year)
dt <- pdata.frame(dt, index = c("ID","year"))
I first tried a filter over periods like so:
dt <- dt %>% group_by(ID) %>%
filter(year %in% first(year)+1:last(year)) %>%
mutate(mean.y = mean(y))
But that doesn't work, and I am not surprised to be honest but I hope you know what I want to achieve. The final result should look like this:
See how the first entry of variable y = 9 for "a-2001" is left out so that it doesnt affect the mean of individual a's other y entries (2+5+3+3)/4
i hope you people could understand it. I would massively appreciate any help. Bye