Understanding onWaveFormDataCapture byte array format

Question

I'm analyzing audio signals on Android. First tried with MIC and succeeded. Now I'm trying to apply FFT on MP3 data comes from Visualizer.OnDataCaptureListener's* onWaveFormDataCapturemethod which is linked to MediaPlayer. There is a byte array called byte[] waveform which I get spectral leakage or overlap when apply FFT on this data.

public void onWaveFormDataCapture(Visualizer visualizer, byte[] waveform, int samplingRate)

I tried to convert the data into -1..1 range by using the code below in a for loop;

        // waveform varies in range of -128..+127
        raw[i] = (double) waveform[i];
        // change it to range -1..1
        raw[i] /= 128.0;

Then I copy the raw into fft buffer;

        fftre[i] = raw[i];
        fftim[i] = 0;

Then I call the fft function;

        fft.fft(fftre, fftim); // in: audio signal, out: fft data

As final process I convert them into magnitudes in dB then draw freqs on screen

        // Ignore the first fft data which is DC component
        for (i = 1, j = 0; i < waveform.length / 2; i++, j++)
        {
            magnitude = (fftre[i] * fftre[i] + fftim[i] * fftim[i]);
            magnitudes[j] = 20.0 * Math.log10(Math.sqrt(magnitude) + 1e-5); // [dB]
        }

When I play a sweep signal from 20Hz to 20kHz, I don't see what I see on MIC. It doesn't draw a single walking line, but several symmetric lines going far or coming near. Somehow there is a weaker symmetric signal on other end of the visualizer. The same code which using 32768 instead of 128 on division works very well on MIC input with AudioRecord.

Where am I doing wrong? (and yes, I know there is a direct fft output)

It would be good if you could include a screenshot of the spectrum. However, it is indeed possible that your problem is caused by a misinterpretation of the audio format. It could be [ENCODING_PCM_16BIT](https://developer.android.com/reference/android/media/AudioFormat#ENCODING_PCM_16BIT), or perhaps even ENCODING_PCM_FLOAT. Depends how your .mp3 decoder is configured. Could even be stereo. — greeble31, Sep 02 '18 at 15:33
@greeble31 I'm using `MediaPlayer()` linked with `Visualizer`, there are no audio format settings in MediaPlayer. Also the Visualizer has no format options. Should I use `AudioTrack` instead? — Phillip, Sep 02 '18 at 18:49
Scratch that. [Docs](https://developer.android.com/reference/android/media/audiofx/Visualizer) say it's 8-bit unsigned mono. My bad. Could you add a dump of a few hundred consecutive samples of `waveform` to your question so we can have a look at the data? And, are you staying within `getCaptureSize()`? — greeble31, Sep 02 '18 at 19:05
Another possibility: Since it's "unsigned", the audio data probably has a DC level of 128. That means you're actually changing it to the range 0..2 — greeble31, Sep 02 '18 at 19:08
Data range is -128..+127, I just printed it. I use maximum capture size `Visualizer.getCaptureSizeRange()[1]` by `mVisualizer.setCaptureSize`, also `Visualizer.getMaxCaptureRate()` in the `mVisualizer.setDataCaptureListener`. Actually if I directly pass the data to fft, the result is the same. It does not care it is 0..1 or -128..127. I can capture the data but it would be somehow random because I have a sweep sound mp3 which is long. — Phillip, Sep 02 '18 at 19:24
Oops I missed something else: You're doing an implicit unsigned-to-signed conversion. Replace `(double) waveform[i]` with `(double) (waveform[i] & 0xFF)` and try again. If that doesn't work, then yes, capture some data. Preferably closer to the 20KHz end. — greeble31, Sep 02 '18 at 19:41
@greeble31 Wow! It worked! Thank you very much. Now I can see a walking line on the sweep signal. I have other simple questions, so the &0xff made it 0..255 range I guess, so I should do `raw[i] /= 255.0;`, right? Another question, I have a `short` buffer with range -32768..+32767 from MIC side, I do the following to have -1..+1 range `raw[i] = (double) short_buffer[i]; raw[i] /= 32768.0;` is it correct? Should I add &0xFFFF to short variable before converting to double? — Phillip, Sep 02 '18 at 20:02
Briefly: For a scale of -1..1, do `raw[i] = (raw[i] - 128) / 128`. It's your preference; you're using logs, so you have extremely high dynamic range, and the scaling really won't matter much. For the MIC, I think you're doing it right, b/c it appears to be a signed array, biased around 0. No further ANDing is necessary. Have fun. — greeble31, Sep 02 '18 at 20:14

greeble31 · Accepted Answer · 2018-09-02T20:17:29.507

The input audio is 8-bit unsigned mono. The line raw[i] = (double) waveform[i] causes an unintentional unsigned-to-signed conversion, and since raw is biased to approximately a 128 DC level, a small sine wave ends up getting changed into a high-amplitude modified square wave, as the signal crosses the 127/-128 boundary. That causes a bunch of funny harmonics (which caused the "symmetric lines coming and going" you were talking about).

Solution

Change to (double) (waveform[i] & 0xFF) so that the converted value lies in the range 0..255, instead of -128..127.

Understanding onWaveFormDataCapture byte array format

1 Answers1