Methods to do outlier detection in sound recognition?

Question

There are models to recognize 2-class sounds, which are class-A and class-B.

How to recognize class-C sounds as abnormal sound ?

I tried to set a threshold while recognizing by frames.

above 70% -> class A or B
else      -> abnormal

For example,

If a sound has 10 frames, and the result is

frame 1 2 3 4 5 6 7 8 9 10
      A B A B A A A B A  A     A=7 B=3
-> class A

frame 1 2 3 4 5 6 7 8 9 10
      B B A B A A A B A  A     A=6 B=4
-> abnormal

The performance is very bad.

what should I do ?

It's probably not a good idea to the results of a binary classifier for outlier detection. What is your underlying model for binary classification? — David Maust, Jan 08 '16 at 08:18
The sounds A and B are specific things you are trying to detect? Have you thought about introducing training examples for C with stray noises or generic audio samples? — David Maust, Jan 08 '16 at 08:35
Do you mean that the model is trained to recognize A, B, C ? If yes, I have thought about this. However, if I train a N-class model, there is always a sound which is different from these N-class. So, I tried to do outlier detection. — yutseho, Jan 08 '16 at 08:43
Does the classifier, that gives you A or B, have a confidence ? — Humam Helfawi, Jan 08 '16 at 08:56
@YuTse. I would highly recommend the classification approach. It has worked well for me in the past. I converted this discussion to an answer below. — David Maust, Jan 08 '16 at 09:00
@HumamHelfawi Confidence interval? No, it's a quite simple model. — yutseho, Jan 08 '16 at 09:06

score 3 · Accepted Answer · answered Jan 08 '16 at 08:58

There are two ways to look at this: as a classification problem, and as an outlier detection problem.

Classification

As a classification problem, it would be possible to bring in outside sounds which may be encountered in the application of your system and use that to create a third class. It is important for this third class to have a large variety of sounds, and potentially a large number of them.

With this, you may want to use cost sensitive one vs all so adjust the precision / recall for picking out classes A and B.

The benefit of this method is you do not have to set an arbitrary threshold for an outlier / anomaly model. Distance may be hard to measure in this context, so finding a proper threshold could be difficult.

Many people, including myself used this technique on a kaggle competition which is similar to your problem. https://www.kaggle.com/c/axa-driver-telematics-analysis

Outlier / Anomaly detection

Since you are using a neural network, it could be possible to build an autoencoder. This will find a manifold of sounds which represent the sounds you are trying to detect. You could use the reconstruction loss as your distance measure for anomaly detection. This will still require you determine a threshold, and it is good to use some existing anomaly / outlier data to do this.

Methods to do outlier detection in sound recognition?

1 Answers1

Linked