R: Apply SVM function for group by in data frame

Question

I have a data frame (df) that looks like that:

   Value Country ID
1   21   RU    AAAU9001025
2   24   NG    AAAU9001848
3   17   EG    ACLU2799370
4   2    EG    ACLU2799370
5   56   RU    ACLU2799370

I want to run SVM classifier for outlier detection on the value, per country, and based on relative small sample, I want to indicate if it is an outlier in each row. So my output will be a data frame with additional logical column that indicates if its an outlier:

    Value Country ID    SVM
1   21  RU  AAAU9001025 FALSE
2   24  NG  AAAU9001848 FALSE
3   17  EG  ACLU2799370 FALSE
4   2   EG  ACLU2799370 TRUE
5   56  RU  ACLU2799370 TRUE
6   25  EG  AMFU3022141 FALSE

I am using the following code but I dont manage to create the desired dataframe:

lapply(split(df,df$Country), 
       function(x) {(e1071::svm(x$Value[1:(ifelse(nrow(x)<50000,nrow(x),50000))], 
                                nu=0.98, type="one-classification", kernel="polynomial"))
         })

please try to help me figure this out, thanks!

score 1 · Accepted Answer · answered Feb 09 '20 at 10:11

simulate something like your data:

NROWS = c(3000,6000,10000)
names(NROWS)=c("RU","EG","NG")

df = lapply(names(NROWS),function(i){
data.frame(
Value = c(rnorm(0.9*NROWS[i]),rpois(0.1*NROWS[i],5)),
Country=i,
ID = paste0(i,"_",1:NROWS[i])
)
})

df = do.call(rbind,df)

Create a function to do svm, because you predict on a subset but return everything..

library(e1071)

SVM_f = function(x,limit=5000){
N = min(c(limit,length(x)))
mdl = svm(x[sample(length(x),N)],
nu=0.98, type="one-classification", kernel="polynomial")
predict(mdl,x)
}

res = by(df,df$Country,function(x){
data.frame(x,SVM = SVM_f(x$Value))
})
res = do.call(rbind,res)
          Value Country   ID   SVM
RU.1  1.2802954      RU RU_1 FALSE
RU.2 -2.7119588      RU RU_2 FALSE
RU.3 -0.4856534      RU RU_3 FALSE
RU.4 -0.5041824      RU RU_4 FALSE
RU.5 -0.7043723      RU RU_5 FALSE
RU.6  0.0472744      RU RU_6 FALSE

You can also use dplyr, but it might run a bit slower:

library(dplyr)
df %>% group_by(Country) %>% mutate(SVM=SVM_f(Value))

R: Apply SVM function for group by in data frame

1 Answers1