0

I'm working on a project that need to detect some voice patterns. for example "someone is screaming": since I do not know who is that person is,a child, men, women ... each have his own voice... etc.

So, I'm looking for a way to detect a "screaming" by for example, save as many fingerprints of "screaming" as possible, then when I need to check if a voice is a "screaming" voice, I may create a fingerprint for it, then search and see if I can find a similarity on the list of "screaming" fingerprints I already have.

My approach is to use something like the following projects:

Each will give me a unique fingerprint of the specific voice, right?, My question is: How would I be able to search for a similarity on the list of "screaming" fingerprints, is there any possible way to generate score or return % of similarity to each fingerprint so I can decide if the voice i'm testing is by % or have a screaming?

Thanks, J.B

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
Joseph
  • 1,716
  • 3
  • 24
  • 42

1 Answers1

4

My approach is to use something like the following projects:

Not very good idea, screaming is usually pretty stable sound while all those libraries search for irregularities in sound instead. They will not detect anything. It is better to use a simple DNN-LSTM classifier instead. You can train it with tensorflow or any other DNN framework. You can find a description of the algorithm here;

Deep Recurrent Neural Network-based Autoencoders for Acoustic Novelty Detection

or here:

Deep Neural Networks for Automatic Detection of Screams and Shouted Speech In Subway Trains

How would I be able to search for a similarity on the list of "screaming" fingerprints, is there any possible way to generate score or return % of similarity to each fingerprint so I can decide if the voice i'm testing is by % or have a screaming?

In your first library you can use queryResult.BestMatch.Confidence for example:

Confidence - returns a value between [0, 1]. A value below 0.15 is most probably a false positive. A value bigger than 0.15 is very likely to be an exact match. For good audio quality queries you can expect getting a confidence > 0.5.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Thanks, working with tensorflow for this, my feeling (and some pre calculation) it will cost us a lot when scale !! – Joseph Jul 26 '17 at 06:12