I want to know the best way to iterate over rows of a data frame when the value of a variable at row n depends on the value of variable(s) at row n-1 and/or n-2. Ideally I would like to do this in a "tidyverse" way, perhaps with purrr::pmap().
For example, say I have this data frame:
library(dplyr)
x <- tibble(t = c(1:10),
a = c(seq(100, 140, 10), rep(NA_real_, 5)),
b = c(runif(5), rep(NA_real_, 5)),
c = c(runif(5), rep(NA_real_, 5)))
x
#> # A tibble: 10 x 4
#> t a b c
#> <int> <dbl> <dbl> <dbl>
#> 1 1 100 0.750 0.900
#> 2 2 110 0.898 0.657
#> 3 3 120 0.731 0.000137
#> 4 4 130 0.208 0.696
#> 5 5 140 0.670 0.882
#> 6 6 NA NA NA
#> 7 7 NA NA NA
#> 8 8 NA NA NA
#> 9 9 NA NA NA
#> 10 10 NA NA NA
I have known values up to time (t) = 5. Beyond that, I wish to project values, using the following formulae:
a = lag(a) * 1.1
b = a * lag(b)
c = b * lag(a, 2)
This code achieves the desired output, but it's a clunky, horrible for loop that scales poorly to larger datasets:
for(i in 1:nrow(x)) {
x <- x %>%
mutate(a = if_else(!is.na(a), a, lag(a, 1) * 1.1),
b = if_else(!is.na(b), b, a * lag(b, 1)),
c = if_else(!is.na(c), c, b * lag(a, 2)))
}
x
#> # A tibble: 10 x 4
#> t a b c
#> <int> <dbl> <dbl> <dbl>
#> 1 1 100 7.50e- 1 9.00e- 1
#> 2 2 110 8.98e- 1 6.57e- 1
#> 3 3 120 7.31e- 1 1.37e- 4
#> 4 4 130 2.08e- 1 6.96e- 1
#> 5 5 140 6.70e- 1 8.82e- 1
#> 6 6 154 1.03e+ 2 1.34e+ 4
#> 7 7 169. 1.75e+ 4 2.45e+ 6
#> 8 8 186. 3.26e+ 6 5.02e+ 8
#> 9 9 205. 6.68e+ 8 1.13e+11
#> 10 10 225. 1.51e+11 2.80e+13