I have a data frame (df) that looks like that:
Value Country ID
1 21 RU AAAU9001025
2 24 NG AAAU9001848
3 17 EG ACLU2799370
4 2 EG ACLU2799370
5 56 RU ACLU2799370
I want to run SVM classifier for outlier detection on the value, per country, and based on relative small sample, I want to indicate if it is an outlier in each row. So my output will be a data frame with additional logical column that indicates if its an outlier:
Value Country ID SVM
1 21 RU AAAU9001025 FALSE
2 24 NG AAAU9001848 FALSE
3 17 EG ACLU2799370 FALSE
4 2 EG ACLU2799370 TRUE
5 56 RU ACLU2799370 TRUE
6 25 EG AMFU3022141 FALSE
I am using the following code but I dont manage to create the desired dataframe:
lapply(split(df,df$Country),
function(x) {(e1071::svm(x$Value[1:(ifelse(nrow(x)<50000,nrow(x),50000))],
nu=0.98, type="one-classification", kernel="polynomial"))
})
please try to help me figure this out, thanks!