0

I have a data table like this:

   a group
1: 1     a
2: 2     a
3: 3     a
4: 4     a
5: 5     a
6: 6     a

The sample can be created from the code below:

structure(list(a = 1:100, group = c("a", "a", "a", "a", "a", 
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", 
"b", "b", "b", "b")), .Names = c("a", "group"), row.names = c(NA, 
-100L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000004790788>)

For each row in each group I would like to:

  1. take value in column a
  2. divide it by value in column a lagged by 2 and subtract 1
  3. divide it by value in column a lagged by 4 and subtract 1
  4. divide it by value in column a lagged by 6 and subtract 1
  5. sum result of steps 2-4 and return it in a new column

So for rows 1-6, I would have NA, and then 7/5 + 7/3 + 7/1 - 3, 8/6 + 8/4 + 8/2 - 3, 9/7 + 9/5 + 9/3 - 3, 10/8 + 10/6 + 10/4 - 3

So based on the table reported in the first chunk, I would like to get a new column, say metric_1, which would, on the 10th row have the value 2.416667

Please note that the values in column a will not in practice correspond to row numbers, but would be some measurements.

The final output would then look like this:

     a group metric_1
 1:  1     a       NA
 2:  2     a       NA
 3:  3     a       NA
 4:  4     a       NA
 5:  5     a       NA
 6:  6     a       NA
 7:  7     a 7.733333
 8:  8     a 4.333333
 9:  9     a 3.085714
10: 10     a 2.416667

I already tried some versions with Reduce which works like a champ if I need to sum some values in a vector, but I haven't been able to tweak it into enabling me to do the division like this.

theforestecologist
  • 4,667
  • 5
  • 54
  • 91
ira
  • 2,542
  • 2
  • 22
  • 36

2 Answers2

0

I'm not sure if this is exactly what you're looking for but perhaps it will help:

library(dplyr)

the_data %>% group_by(group) %>% 
mutate(metric_1 = (a/lag(a, 2)-1)+( a/lag(a,4)-1) + (a/lag(a, 6) - 1 )) %>%
ungroup()
JeffR
  • 524
  • 3
  • 10
  • Thank you, but I need something more scalable as I have millions of observation and need to perform this operation over many more than 3 different lags. – ira Nov 10 '17 at 21:38
0

found one possible solution as:

dt[,
     list(a, Reduce(`+`, lapply(shift(a, seq(2, 6, by = 2)),
       function(x) a/x - 1))), 
     by = "group"]

But it is rather slow.

ira
  • 2,542
  • 2
  • 22
  • 36