2

I am trying to build an one-class SVM model using Microsoft ML package for novelty detection and managed to get some results. The result of the prediction using test data contains the column "Score", which I am not very sure about the meaning here. As I search online, there isnt a very good explanation for that.

As a beginner in machine learning, my guess that the score represents some-what of the probability of the data point being a true anomaly because the higher the score, the more likely the data entry is an anomaly. please correct me if I am wrong and I am also wondering about the algorithms to determine the threshold. I know a few, such as GA, but really confused about how to select an appropriate one to use.

Thanks!

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
ELI
  • 359
  • 1
  • 4
  • 20

1 Answers1

1

You are right about "higher the score - the more likely it is an anomaly". In order to find a threshold, I use rxLinePlot to plot a graph like this : plotting scores in oneclasssvm

From the above diagram, it is clear that the threshold is any value greater than 0.1. such plots will help in figuring out a threshold based on your use case. here is the complete R code if you wish to generate this graph on your machine : https://gist.github.com/ramnov/b08224b06c75d613688f0c8d61511d9b