2

I am trying to recognise a sequence of audio frames on an embedded system - an audio frame being a frequency or interpolation of two frequencies for a variable amount of time. I know the sounds I am trying to recognise (i.e. the start and end frequencies which are being linearly interpolated and the duration of each audio frame), but they are produced by a another embedded system so the microphone and speaker are cheap and somewhat inaccurate. The output is a square wave. Any suggestions how to go about doing this?

What I am trying to do now is to use FFT to get the magnitude of all frequencies, detect the peaks, look at the detection duration/2 ms ago and check if that somewhat matches an audio frame, and finally just checking if any sound I am looking for matched the sequence.

So far I used the FFT to process the microphone input - after applying a Hann window - and then assigning each frequency bin a coefficient that it's a peak based on how many standard deviations is away from the mean. This hasn't worked great since it thought there are peaks when it was silence in the room. Any ideas on how to more accurately detect the peaks? Also I think there are a lot of harmonics because of the square wave / interpolation? Can I do harmonic product spectrum if the peaks don't really line up at double the frequency?

Here I graphed noise (almost silent room) with somewhere in the interpolation of 2226 and 1624 Hz. https://i.stack.imgur.com/R5Gs2.png

I sample at 91 microseconds -> 10989 Hz. Should I sample more often?

I added here samples of how the interpolation sounds when recorded on my laptop and on the embedded system. https://easyupload.io/m/5l72b0


#define MIC_SAMPLE_RATE         10989 // Hz
#define AUDIO_SAMPLES_NUMBER    1024


MicroBitAudioProcessor::MicroBitAudioProcessor(DataSource& source) : audiostream(source)
{   
    arm_rfft_fast_init_f32(&fft_instance, AUDIO_SAMPLES_NUMBER);

    buf = (float *)malloc(sizeof(float) * (AUDIO_SAMPLES_NUMBER * 2));
    output = (float *)malloc(sizeof(float) * AUDIO_SAMPLES_NUMBER);
    mag = (float *)malloc(sizeof(float) * AUDIO_SAMPLES_NUMBER / 2);
}

float henn(int i){
    return 0.5 * (1 - arm_cos_f32(2 * 3.14159265 * i / AUDIO_SAMPLES_NUMBER));
}

int MicroBitAudioProcessor::pullRequest()
{
    int s;
    int result;

    auto mic_samples = audiostream.pull();

    if (!recording)
        return DEVICE_OK;

    int8_t *data = (int8_t *) &mic_samples[0];

    int samples = mic_samples.length() / 2;

    for (int i=0; i < samples; i++)
    {

        s = (int) *data;
        result = s;

        data++;
        buf[(position++)] = (float)result;


        if (position % AUDIO_SAMPLES_NUMBER == 0)
        {
            position = 0;

            float maxValue = 0;
            uint32_t index = 0;

            // Apply a Henn window 
            for(int i=0; i< AUDIO_SAMPLES_NUMBER; i++)
                buf[i] *= henn(i);

            arm_rfft_fast_f32(&fft_instance, buf, output, 0);
            arm_cmplx_mag_f32(output, mag, AUDIO_SAMPLES_NUMBER / 2);
       }
    }

    return DEVICE_OK;
}

uint32_t frequencyToIndex(int freq) {
    return (freq / ((uint32_t)MIC_SAMPLE_RATE / AUDIO_SAMPLES_NUMBER));
} 

float MicroBitAudioProcessor::getFrequencyIntensity(int freq){
    uint32_t index = frequencyToIndex(freq);
    if (index <= 0 || index >= (AUDIO_SAMPLES_NUMBER / 2) - 1) return 0;
    return mag[index];
}


  • C++, `malloc`, ouch – Jarod42 Mar 31 '21 at 13:18
  • Let me see if I understand, you have a triangular wave with varying period? in that case FFT will not be the best approach, is better to use a peak detection in a high pass filtered signal ;) – Bob Apr 02 '21 at 19:04
  • It's supposed to be a square wave - I'm not sure what exactly came out as there is a lot of noise because of the microphone. Do you know where I could find some - good - resources on peak detection and high pass filtering? – vlad turcuman Apr 03 '21 at 09:00

0 Answers0