Collect decoded audio from libav as doubles

Question

I'm currently trying to gather decoded audio data (from multiple formats) to perform certain audio manipulations (using a *.wav file for testing).

I have a class that handles all the decoding via FFmpeg libav. If I extract the data as unit8_t into a vector, and

for (int i = 0; i < bytevector.size(); i++) {
    fwrite(&bytevector[i], sizeof (uint8_t), 1, outfile2);
}

to a raw file and play it via play -t raw -r 44100 -b16 -c 1 -e signed sound.raw it sounds perfectly fine.

However, how is it possible to have all the correct information as doubles when the file for example is 2 bytes per sample and the frame->data information is given as uint8_t? The wav files I've tested are 44100/16bits/1 channel. (I already have code that will change uint8_t* into a double)

Opening the same files with Scilab will show half the size of the byte vector as doubles.

wav file in Scilab as an array of doubles shows:
-0.1, -0.099, -0.098, ..., 0.099, +0.1

versus byte vector:
51, 243, 84, 243, 117, 243, ...

Can 51 and 243 really form a double? Any suggestions on how to get past this issue?

Code below for reference:

 while ((av_read_frame(formatContext, &readingPacket)) == 0) {
        if (readingPacket.stream_index == audioStreamIdx) {
            AVPacket decodingPacket = readingPacket;

            while (decodingPacket.size > 0) {
                int gotFrame = 0;
                int result = avcodec_decode_audio4(context, frame, &gotFrame, &decodingPacket);

                if (result < 0) {
                    break;
                }

                decoded = FFMIN(result, decodingPacket.size);

                if (gotFrame) {
                    data_size = (av_get_bytes_per_sample(context->sample_fmt));
                    if (data_size < 0) {
                    }

                    // Only for 1 channel temporarily
                    for (int i = 0; i < frame->nb_samples; i++) {
                        for (int ch = 0; ch < context->channels; ch++) {
                            for (int j = 0; j < data_size; j++) {
                                bytevector.push_back(*(frame->data[ch] + data_size * i + j)); 
                            }
                        }
                    }
                } else {
                    decodingPacket.size = 0;
                    decodingPacket.data = NULL;
                }
                decodingPacket.size -= result;
                decodingPacket.data += result;
            }
        }
        av_free_packet(&readingPacket);
    }

`double` ? That's probably 52 bits of precision, 11 bits of dynamic range or 6000 dB. That is insane. And `-b16` in your command line means 16 bits, **not** 8 bits. — MSalters, Jul 29 '15 at 13:32
Double is definitely overkill for what is being done to the audio. I was thrown off by the fact that Scilab displays the values as "doubles" when the array is opened in the viewer. But yeah, below is the answer on how to represent the data of two uint8_t (or 2 bytes) in the same manner as Scilab (range from -1.0 to +1.0). Thanks. — gapc, Jul 29 '15 at 13:45
@MSalters - most decent DAW applications use 64bit internal processing, this way you lose less precision, even if you still output 24bit master. — dtech, Jul 29 '15 at 13:52
@ddriver: That's probably 64 bits PCM. 190 dB dynamic range, not 6000. For a comparison, Hiroshima was about 250 dB. A star going supernova doesn't exceed 1000 dB. — MSalters, Jul 29 '15 at 14:04

gapc · Accepted Answer · 2015-07-29T13:49:59.683

0

Quick way to transform two bytes into a float:

byte bits[] = {195,255}; //first sample in the test s16 wav file
int16_t sample;
memcpy(&sample,&bits,sizeof(bits));
std::cout<<sample*(1.0f/32768.0f)<<std::endl;

This code yields -0.001861572265625 when printed (with more precision setprecision(xx);) which is first number given by Scilab with the same file.

I hope this help anybody with similar issues.

edited Jul 29 '15 at 13:49

answered Jul 29 '15 at 13:28

gapc

13
6

score 0 · Answer 2 · answered Jul 29 '15 at 13:40

Audio data is stored in many different formats. That you get a uint8_t[] array means rather little. It's not one byte per array. Instead, you need to know the format. Here -b16 tells me that the uint8_t[] data is in fact 16 bits PCM-encoded data, i.e. on a scale from -32768 to +32767. Scilab appears to prefer a floating-point scale, and therefore divides by 32768.0. That's just a representation change; it just shrinks the scale to -1.0, +1.0.

Compare it to angles: a right angle is 90 degrees on pi/2 radians; the exact number doesn't matter but both are 1/4th of a full circle.

Collect decoded audio from libav as doubles

2 Answers2