SMOTE oversampling for anomaly detection using a classifier

Question

I have sensor data and I want to do live anomaly detection using LOF on the training set to detect anomalies and then apply the labeled data to a classifier to do classification for new data points. I thought about using SMOTE because I want more anamolies points in the training data to overcome the imbalanced classification problem but the issue is that SMOTE created many points which are inside the normal range. how can I do oversampling without creating samples in the normal data range?

the graph for the data before applying SMOTE.

data after SMOTE

It seems to me that the graph before applying SMOTE (the original data) should be good enough to make a good anomaly detection classifier, you can see clearly where the boundaries are. Why do you want to do SMOTE? — TYZ, Jul 16 '18 at 12:59

score 0 · Answer 1 · answered Jul 16 '18 at 19:16

SMOTE is going to linearly interpolate synthetic points between a minority class sample's k-nearest neighbors. This means that you're going to end up with points between a sample and its neighbors. When samples are all over the place like this, it makes sense that you're going to create synthetic points in the middle.

SMOTE should really be used to identify more specific regions in the feature space as the decision region for the minority class. This doesn't seem to be your use case. You want to know which points "don't belong," per se.

This seems like a fairly nice use case for DBSCAN, a density-based clustering algorithm that will identify points beyond some distance, eps, as not belonging to the same neighborhood.

SMOTE oversampling for anomaly detection using a classifier

1 Answers1