Correct reading of samples from .wav file

Question

I am trying to read correctly a WAVE file, PCM, mono, 16 bits (2 bytes per sample). I have managed to read the header. The problem is reading (writing) the data part.

As far as I understand the 16-bit samples in the data chunk are little-endian, and "split" into two chunks of 8 bits each. So for me a way to read the correct data should be:

Read file and put chunks into two differentint8_t variables (or a std::vector<int8_t>..)
In some way "join" these two variables to make a int16_t and being able to process it.

The problem is I have no idea on how to deal with the little-endianness and the fact that these samples aren't unsigned, so I can't use the << operator.

This is one of the test I've done, without success:

int8_t buffer[], firstbyte,secondbyte;
int16_t result;
std::vector<int16_t> data;
while(Read bytes and put them in buffer){
for (int j=0;j<bytesReadFromTheFile;j+=2){
                    firstbyte = buffer[j];
                    secondbyte = buffer[j+1];
                    result = (firstbyte);
                    result = (result << 8)+secondbyte; //shift first byte and add second
                    data.push_back(result);
                }
}

To be more verbose, I am using this code found online and created a class starting from it (The process is the same, but the Class configuration is very long and has many features that aren't useful for this problem):

#include <iostream>
#include <string>
#include <fstream>
#include <cstdint>

using std::cin;
using std::cout;
using std::endl;
using std::fstream;
using std::string;

typedef struct  WAV_HEADER
{
    /* RIFF Chunk Descriptor */
    uint8_t         RIFF[4];        // RIFF Header Magic header
    uint32_t        ChunkSize;      // RIFF Chunk Size
    uint8_t         WAVE[4];        // WAVE Header
    /* "fmt" sub-chunk */
    uint8_t         fmt[4];         // FMT header
    uint32_t        Subchunk1Size;  // Size of the fmt chunk
    uint16_t        AudioFormat;    // Audio format 1=PCM,6=mulaw,7=alaw,     257=IBM Mu-Law, 258=IBM A-Law, 259=ADPCM
    uint16_t        NumOfChan;      // Number of channels 1=Mono 2=Sterio
    uint32_t        SamplesPerSec;  // Sampling Frequency in Hz
    uint32_t        bytesPerSec;    // bytes per second
    uint16_t        blockAlign;     // 2=16-bit mono, 4=16-bit stereo
    uint16_t        bitsPerSample;  // Number of bits per sample
    /* "data" sub-chunk */
    uint8_t         Subchunk2ID[4]; // "data"  string
    uint32_t        Subchunk2Size;  // Sampled data length
} wav_hdr;

// Function prototypes
int getFileSize(FILE* inFile);

int main(int argc, char* argv[])
{
    wav_hdr wavHeader;
    int headerSize = sizeof(wav_hdr), filelength = 0;

    const char* filePath;
    string input;
    if (argc <= 1)
    {
        cout << "Input wave file name: ";
        cin >> input;
        cin.get();
        filePath = input.c_str();
    }
    else
    {
        filePath = argv[1];
        cout << "Input wave file name: " << filePath << endl;
    }

    FILE* wavFile = fopen(filePath, "r");
    if (wavFile == nullptr)
    {
        fprintf(stderr, "Unable to open wave file: %s\n", filePath);
        return 1;
    }

    //Read the header
    size_t bytesRead = fread(&wavHeader, 1, headerSize, wavFile);
    cout << "Header Read " << bytesRead << " bytes." << endl;
    if (bytesRead > 0)
    {
        //Read the data
        uint16_t bytesPerSample = wavHeader.bitsPerSample / 8;      //Number     of bytes per sample
        uint64_t numSamples = wavHeader.ChunkSize / bytesPerSample; //How many samples are in the wav file?
        static const uint16_t BUFFER_SIZE = 4096;
        int8_t* buffer = new int8_t[BUFFER_SIZE];
        while ((bytesRead = fread(buffer, sizeof buffer[0], BUFFER_SIZE / (sizeof buffer[0]), wavFile)) > 0)
        {
            * /** DO SOMETHING WITH THE WAVE DATA HERE **/ *
            cout << "Read " << bytesRead << " bytes." << endl;
        }
        delete [] buffer;
        buffer = nullptr;
        filelength = getFileSize(wavFile);

        cout << "File is                    :" << filelength << " bytes." << endl;
        cout << "RIFF header                :" << wavHeader.RIFF[0] << wavHeader.RIFF[1] << wavHeader.RIFF[2] << wavHeader.RIFF[3] << endl;
        cout << "WAVE header                :" << wavHeader.WAVE[0] << wavHeader.WAVE[1] << wavHeader.WAVE[2] << wavHeader.WAVE[3] << endl;
        cout << "FMT                        :" << wavHeader.fmt[0] << wavHeader.fmt[1] << wavHeader.fmt[2] << wavHeader.fmt[3] << endl;
        cout << "Data size                  :" << wavHeader.ChunkSize << endl;

        // Display the sampling Rate from the header
        cout << "Sampling Rate              :" << wavHeader.SamplesPerSec << endl;
        cout << "Number of bits used        :" << wavHeader.bitsPerSample << endl;
        cout << "Number of channels         :" << wavHeader.NumOfChan << endl;
        cout << "Number of bytes per second :" << wavHeader.bytesPerSec << endl;
        cout << "Data length                :" << wavHeader.Subchunk2Size << endl;
        cout << "Audio Format               :" << wavHeader.AudioFormat << endl;
        // Audio format 1=PCM,6=mulaw,7=alaw, 257=IBM Mu-Law, 258=IBM A-Law, 259=ADPCM

        cout << "Block align                :" << wavHeader.blockAlign << endl;
        cout << "Data string                :" << wavHeader.Subchunk2ID[0] << wavHeader.Subchunk2ID[1] << wavHeader.Subchunk2ID[2] << wavHeader.Subchunk2ID[3] << endl;
    }
    fclose(wavFile);
    return 0;
}

// find the file size
int getFileSize(FILE* inFile)
{
    int fileSize = 0;
    fseek(inFile, 0, SEEK_END);

    fileSize = ftell(inFile);

    fseek(inFile, 0, SEEK_SET);
    return fileSize;
}

The problem is in the /** DO SOMETHING WITH THE WAVE DATA HERE **/ . I have no Idea on how to get the sample value.

Well, for starters, you need to open the WAV file in binary mode, not text mode. More important, you should not be assuming the `wav_hdr` you have setup is how all WAV files are formatted. There can be other sub-chunks present. Even the contents of the `fmt` sub-chunk are dynamic. The only guarantee you have is that `fmt` appears before `data`. You need to read the WAV file one sub-chunk as a time, looking at each sub-chunk's type, parsing its data as needed, and ignoring any sub-chunks you don't care about. You need to do some more research on how the RIFF format actually works. — Remy Lebeau, May 15 '21 at 01:17
The wav_hdr struct works for the specific files I am working with. The problem is reading the data, in order to process it. After reading the header, the data subchunk should be "continuous", right? Or do I need to care about other things inside it? — Barsaas, May 15 '21 at 01:27
`result = (result << 8)+secondbyte;` is working big-endian, not little-endian. And it probably needs some casts to convert everything to unsigned. — Mark Ransom, May 15 '21 at 01:31
@MarkRansom something like this could work? `uint16_t result=(uint8_t)secondbyte;` `result = (result << 8)+(uint8_t)firstbyte;` `data.push_back(result);` — Barsaas, May 15 '21 at 02:05
It might, best way to know is to try it and see what happens. That's what I always end up doing. — Mark Ransom, May 15 '21 at 03:07
Usual 16-bit samples are not split into chunks. Typically one need to deal with pairs of 16-bit samples - one per channel. Typically these samples are unsigned and there is no need to do anything with little-endianness because machine the code running on is little-endiann. — user7860670, May 15 '21 at 07:00
@user7860670 but the header indicates "2 bytes per sample". This mean I can just take two bytes at a time, add them and put them into a 16 bit uint, or i'm missing something?Thanks! — Barsaas, May 15 '21 at 15:57
also here it says samples greater than 8 bytes are always signed. https://stackoverflow.com/questions/10731226/how-to-determine-if-8bit-wav-file-is-signed-or-unsigned-using-java-and-without#:~:text=In%20the%20wav%20File%2C%208,signed%20integers%20in%202's%2Dcomplement.&text=8%20bit%20(or%20lower)%20WAV%20files%20are%20always%20unsigned.&text=The%20size%20of%20i%20is,contain%20the%20specified%20sample%20size. — Barsaas, May 15 '21 at 16:12

score 0 · Accepted Answer · edited May 15 '21 at 19:52

I'm a Java programmer, not C++, but I've dealt with this often.

The PCM data is organized by frame. If it's mono, little-endian, 16-bit the first byte will be the lower half of the value, and the second byte will be the upper and include the sign bit. Big-endian will reverse the bytes. If it's stereo, a full frame (I think it's left then right but I'm not sure) is presented intact before moving on to the next frame.

I'm kind of amazed at all the code being shown. In Java, the following suffices for PCM encoded as signed values:

public short[] fromBufferToPCM(short[] audioPCM, byte[] buffer)
{
    for (int i = 0, n = buffer.length; i < n; i += 2)
    {
        audioPCM[i] = (buffer[i] & 0xff) | (buffer[i + 1] << 8);
    }

    return audioBytes;
}

IDK how to translate that directly to C++, but we are simply OR-ing together the two bytes with the second one first being shifted 8 places to the left. The pure shift picks up the sign bit. (I can't recall why the & 0xff was included--I wrote this a long while back and it works.)

Curious why so many answers are in the comments and not posted as answers. I thought comments were for requests to clarify the OP's question.

zloesch · Answer 2 · 2022-12-11T22:56:21.673

something like this works:

int8_t * tempBuffer = new int8_t [numSamples];
int index_for_loop = 0; 
float INT16_FAC = pow(2,15) - 1;
double * outbuffer = new double [numSamples];

inside while loop:

for(int i = 0; i < BUFFER_SIZE; i += 2)
            { 
                firstbyte = buffer[i]; 
                secondbyte = buffer[i + 1]; 
                result = firstbyte; 
                result = (result << 8) +secondbyte; 
                tempBuffer[index_for_loop] = result; 
                index_for_loop += 1; 
            }

then normalize between -1 and 1 by doing:

for(int i = 0; i <numSamples; i ++)
{ 
    outbuffer[i] = float(tempBuffer[i]) / INT16_FAC; 
}

got normalize from: sms-tools
Note : this works for mono files with 44100 samplerate and 16 bit resolution.

Correct reading of samples from .wav file

2 Answers2