In my dataset, I have observations on different people, for each month, in different places. I want to create a new column which is: The average of value
for last month for all places the person visited other than the place of the current row. So for example, row 6 would be the average of rows 2-5 (last month in places that are not A
).
In my actual dataset, the data is not balanced -- and users are observed for different months -- so any solution should not necessarily rely on that.
library(tidyr)
# Generate people, places, and dates
people <- letters[1:3]
places <- LETTERS[1:5]
dates <- seq(as.Date("2020/01/04"), by = "month", length.out = 3)
# Now cross data so there is observation for each person, for each place, for each month
crossed <- crossing(people, dates, places)
# Add random values
crossed$value <- rnorm(nrow(crossed), 2)
Here is what the data looks like
people dates places value
<chr> <date> <chr> <dbl>
1 a 2020-01-04 A 2.94
2 a 2020-01-04 B 1.74
3 a 2020-01-04 C 2.68
4 a 2020-01-04 D 3.96
5 a 2020-01-04 E 0.821
6 a 2020-02-04 A 1.86
7 a 2020-02-04 B 1.04
8 a 2020-02-04 C 1.51
9 a 2020-02-04 D 3.62
10 a 2020-02-04 E 1.81