1

I have a custom function I would like to apply to a data table such as follows:

DT = data.table(x = rep(c("a","b","c"), each = 2), 
                x2 = rep(c("h","j"), each = 3), 
                y = c(1,3), 
                v = 1:6, 
                z = 7:12, 
                w = 13:18)


DT

   x x2 y v  z  w
1: a  h 1 1  7 13
2: a  h 3 2  8 14
3: b  h 1 3  9 15
4: b  j 3 4 10 16
5: c  j 1 5 11 17
6: c  j 3 6 12 18

I have a function which I would like to score the numeric columns of DT by column x. The function scores by two fixed columns and performs a calculation on the 3rd column over the numeric columns. The function is as follows (the underscore represents the column that is not fixed):

scoring <- function(_, z, w) {
  f <- abs(w - _) / abs(w - z)
  f[is.infinite(f)] <- 1
  f[is.nan(f)] <- 1
  return(median(f))
}

The result would (in this case) have 2 new columns, y and v both of which would be aggregated using the the score function by x (that is for groups "a", "b" and "c". E.g.:

y: a: 0.9166667
y: b: 1.25
y: c: 1.583333

v: a: 1
v: b: 1
v: c: 1

My question is: I know I can use the by functionality in data.table, but I don't know how to tell it to keep two columns fixed for my custom function and perform the calculation on the remaining columns.

alistaire
  • 42,459
  • 4
  • 77
  • 117
Aus_10
  • 670
  • 7
  • 15
  • 1
    For clarity, I think you should explicitly say what the expected output is, not just give pseudocode with "score" – Frank Nov 19 '17 at 02:57
  • 2
    This seems to work, if you make your function valid (eg by changing `_` to `a`, though it doesn't give the same values so I'm not sure: `DT[, lapply(.SD, scoring, z = z, w = w), by=x, .SDcols=y:v]` – Frank Nov 19 '17 at 03:15
  • 1
    I did a quick hand calculation so I might have off. I spot check your solution on the actual dataset and it checks out. Thanks! – Aus_10 Nov 19 '17 at 04:10

0 Answers0