My goal is to be able to detect a specific noise that comes through the speakers of a PC using Python. That means the following, in pseudo code:
Sound is being played out of the speakers, by applications such as games for example, ny "audio to detect" sound happens, and I want to detect that, and take an action
The specific sound I want to detect can be found here.
If I break that down, i believe I need two things:
- A way to sample the audio that is being streamed to an audio device I actually have this bit working -- with the code found here : https://gist.github.com/renegadeandy/8424327f471f52a1b656bfb1c4ddf3e8 -- it is based off of sounddevice example plot - which I combine with an audio loopback device. This allows my code, to receive a callback with data that is played to the speakers.
- A way to compare each sample with my "audio to detect" sound file. The detection does not need to be exact - it just needs to be close. For example there will be lots of other noises happening at the same time, so its more being able to detect the footprint of the "audio to detect" within the audio stream of a variety of sounds.
Having investigated this, I found technologies mentioned in this post on SO and also this interesting article on Chromaprint. The Chromaprint article uses fpcalc to generate fingerprints, but because my "audio to detect" is around 1 - 2 seconds, fpcalc can't generate the fingerprint. I need something which works across smaller timespaces.
Can somebody help me with the problem #2 as detailed above?
How should I attempt this comparison (ideally with a little example), based upon my sampling using sounddevice in the audio_callback
function.
Many thanks in advance.