I have a custom function I would like to apply to a data table such as follows:
DT = data.table(x = rep(c("a","b","c"), each = 2),
x2 = rep(c("h","j"), each = 3),
y = c(1,3),
v = 1:6,
z = 7:12,
w = 13:18)
DT
x x2 y v z w
1: a h 1 1 7 13
2: a h 3 2 8 14
3: b h 1 3 9 15
4: b j 3 4 10 16
5: c j 1 5 11 17
6: c j 3 6 12 18
I have a function which I would like to score the numeric columns of DT
by column x
. The function scores by two fixed columns and performs a calculation on the 3rd column over the numeric columns. The function is as follows (the underscore represents the column that is not fixed):
scoring <- function(_, z, w) {
f <- abs(w - _) / abs(w - z)
f[is.infinite(f)] <- 1
f[is.nan(f)] <- 1
return(median(f))
}
The result would (in this case) have 2 new columns, y
and v
both of which would be aggregated using the the score function by x
(that is for groups "a", "b" and "c". E.g.:
y: a: 0.9166667
y: b: 1.25
y: c: 1.583333
v: a: 1
v: b: 1
v: c: 1
My question is:
I know I can use the by
functionality in data.table, but I don't know how to tell it to keep two columns fixed for my custom function and perform the calculation on the remaining columns.