My question is similar to a few that have been asked before, but I hope different enough to warrant a separate question.
See here, and here. I'll pull some of the same example data as these questions. For context to my question- I am looking to see how my observed catch-rate (sea creatures) changed over multiple days of sampling the same area.
I want to calculate the difference between the first sample day at a given site (first Letter in data below), and the subsequent sample days (next rows of same letter).
#Example data
df <- data.frame(
id = c("A", "A", "A", "A", "B", "B", "B"),
num = c(1, 8, 6, 3, 7, 7 , 9),
What_I_Want = c(NA, 7, 5, 2, NA, 0, 2))
The first solution that I found calculates a lagged difference between each row. I also wanted this calculation- so it was helpful to find:
#Calculate lagged differences
df_new <- df %>%
# group by condition
group_by(id) %>%
# find difference
mutate(diff = num - lag(num))
Here the difference is between A.1 and A.2; then A.2 and A.3 etc...
What I would like to do now is calculate the difference with respect to the first value of each group. So for letter A, I would like to calculate 1 - 8, then 1 - 6, and finally 1 - 3. Any suggestions?
One clunky solution (linked above) is to create two (or more) columns for each distance lagged and some how merge the results that I want
df_clunky = df %>%
group_by(id) %>%
mutate(
deltaLag1 = num - lag(num, 1),
deltaLag2 = num - lag(num, 2))