Count the number of audio samples between two identical tones

Question

I would like to count the number of sound samples between the beginnings of two identical tones in time domain. The tones are generated at random points in time. To do so, I think that I need to do three main things for each audio-samples buffer:

Recognize the desired tone from the buffer
Find the position of the first audio sample of that tone
count the audio samples between that sample and the first sample of the previously recognized tone

If that design makes sense, any suggestions on how to implement it? I currently use TarsosDSP Java library for sound processing.

You do realize that a "tone" is not like a point in time, right? Or how do you define "tone"? — ppeterka, Aug 11 '16 at 19:35
To me, a tone is a set of sound samples over a time domain, each having a level of energy or amplitude. — MJZ, Aug 11 '16 at 19:38
That ain't gonna fly. Unless you can ensure that the samples to be detected are **exactly** the same, you won't be able to detect the mark. And it *never* happens in real life. Phase is going to ruin all this... Even an uncompressed all digital chain can ruin your day with a 44.1->48->44.1 conversion somewhere, not to mention if any analog stuff, or compression is used... This is why such operations are performed on the frequency domain usually... — ppeterka, Aug 11 '16 at 19:41
Thanks. Then how would I be able to measure the difference (in samples or time) between two identical tones? — MJZ, Aug 11 '16 at 19:57
The big issue is matching the marker... I'd go by trying to find a match using FFT. I found these that might be useful: [Windows Phone: Sound pattern matching using Fast Fourier Transform in Windows Phone](http://social.technet.microsoft.com/wiki/contents/articles/27421.windows-phone-sound-pattern-matching-using-fast-fourier-transform-in-windows-phone.aspx) This is Windows Phone, but has a good exmplanation. [Detecting whistles, pops and other sounds](http://dsp.stackexchange.com/questions/9358/how-do-i-go-about-detecting-whistles-pops-and-other-sounds-in-live-audio-input) this is DSP. — ppeterka, Aug 11 '16 at 21:12
Are you detecting these tones vs silence or are they being played while other sounds also occurring? Are the tones a fixed frequency? Counting frames is straightforward. The "hard part" is identifying the starting points, but finding those points is much easier if the alternative to the tone is simply silence. How accurate do you hope to be? The "start" of a tone could be subject to some degree of interpretation. But if, say, a millisecond of accuracy is sufficient, then the number of frames can be +/- 200 frames, roughly, at 44100 fps. — Phil Freihofner, Aug 12 '16 at 17:19
@ppeterka Thanks for the hints. The shared links (especially the first one) helped me to better understand both the problem and solutions spaces. However, I think that an FFT-based solution would not give me the desired granularity that I'm shooting for. — MJZ, Aug 25 '16 at 19:59
@PhilFreihofner I would like to search for a match to a known waveform in a given audio stream. The source of the stream is the microphone. The majority of the frequency components of the known waveform are closer to the non-audible frequency spectrum, between 16 kHz and 18 kHz. The recorded stream is subject to room-level noise (e.g. voices of humans and pets, AC hissing, etc). I'm shooting for no more than 10 samples of error, which is less than 1 ms. I assume that you are using the terms "samples" and "frames" interchangeably. Thanks for the good questions. — MJZ, Aug 25 '16 at 20:11
Good to hear I was of any help... hotpaw's and Phil Freihofner's suggestions combined however can be of use too: if the marker is for example a combination of fixed amplitude sine waves with known frequencies (e.g. a beep of some kind), then it is possible to have a filter for those frequencies with a high Q. Then you can compare two signals: the dry one, and the filtered one, and if the two are have the same amplitude, you got the marker... — ppeterka, Aug 25 '16 at 20:11
What sort of mike do you use that picks up that range over room noise? — ppeterka, Aug 25 '16 at 20:13
@ppeterka There are two issues with amplitude-based detection. First, almost all mics that are shipped with phones (and laptops maybe) attenuate sounds at frequencies that are below and above what humans can hear for the sake of noise cancellation, which makes the amplitude of the original tone much higher than the recorded one in my case. Second, because there are no restrictions on the distance of the sound source (the speaker) from the mic, it is challenging to predict the amplitude while distance is a variable. — MJZ, Aug 25 '16 at 22:31
@ppeterka My solution assumes that any mic that is used with a personal computer or a mobile phone should work. When I mentioned room-level noise, I meant any kind of sounds that you would hear in a typical room (house or office). — MJZ, Aug 25 '16 at 22:34

Count the number of audio samples between two identical tones

0 Answers0