I would like to ask if it is possible to apply this function to a data.table approach:
myfunction <- function(i) {
a <- test.dt[i, 1:21, with = F]
final <- t((t(b) == a) * value)
final[is.na(final)] <- 0
sum.value <- rowSums(final)
final1 <- cbind(train.dt, sum.value)
final1 <- final1[order(-sum.value),]
final1 <- final1[final1$sum.value > 0,]
suggestion <- unique(final1[, 22, with = F])
suggestion <- suggestion[1:5, ]
return(suggestion)
}
This is a custom kNN function I made to be used on character columns. It gives top 5 suggestions/predictions. However, It has performance issues on my end if it is performed on large test data (I cannot tweak it myself so far).
The variables used are as folllows:
train.dt -- the training data, includes 22 columns (21 features, 1 label column)
test.dt -- the test data, same structure as training data
value -- a vector that contains the weights/importance value of 21 features
sum.value -- sum of all the weights on value vector (sum(value))
b -- has the same data as the training data, but excluding the label column
a -- has the same data as the test data, but excluding the label column
suggestion -- the output
Also, I want to use lapply (or any appropriate apply family) on this function, and the i variable
in the function pertains to the row number on the test data: meaning, I want to apply it on each rows of the test data. I cannot make it yet.
Hope you can understand and thank you in advance!