Reading PCM audio file is giving sometimes wrong samples

Question

I have a 16 bit, 48kHz, 1-channel (mono) PCM audio file (with no header but it would be the same with a WAV header anyway) and I can read that file correctly using a software such as Audacity, however when I try to read it programatically (in C++), some samples just seem to be out of place while most are correct when comparing Audacity values.

My process of reading the PCM file is the following:

Convert the byte array of PCM to a short array to get readable values by bitshifting (the order of bytes is little-endian here).

for(int i = 0; i < bytesSize - 1; i += 2)
    shortValue[i] = bytes[i] | bytes[i + 1] << 8;

note: bytes is a char array of the binary contents of the PCM file. And shortValue is a short array.

Convert the short values to Amplitude levels in a float array by dividing by the max value of short (32767)

for(int i = 0; i < shortsSize ; i++)
    amplitude[i] = static_cast<float>(shortValue[i]) / 32767;

This is obviously not optimal code and I could do it in one loop but for the sole purpose of explaining I separated the two steps.

So what happens exactly is that when I try to find very big changes of amplitude levels in my last array, it shows me samples that are not correct? Like here in Audacity notice how the wave is perfectly smooth and how the sample 276,467 pointed in green goes just a bit lower to the next sample pointed in red, which should be around -0.17.

However, when reading from my code, I get a totally wrong value of the red sample (-0.002), while still getting a good value of the green sample (around -0.17), the sample after the red one is also correct (around -0.17 as well).

I don't really understand what's happening and how Audacity is able to read those bytes correctly, I tried with multiple PCM/WAV files and I get the same results. Any help would really be appreciated!

It's going to be really hard to help you without a proper [mcve]. Your debugging show several variables which are not explained. — Some programmer dude, Jul 22 '21 at 08:56
@MikeVine `bytes` is a `char` array of the binary contents of the PCM file. And `shortValue` is a short array. I added those infos to my question! @Someprogrammerdude I'll try and provide that ASAP! :) — Saliom, Jul 22 '21 at 09:21
Note that it's implementation-defined if `char` is a signed or unsigned data-type. Doing bitwise operations on signed numbers could lead to problems. Make sure that the byte array is explicitly unsigned (I recommend using e.g. `uint8_t` or `std::byte` for raw byte data). — Some programmer dude, Jul 22 '21 at 09:24
Reading the file into a `unsigned char` array instead of a `char` array solved the issue! `uint8_t` and `std::byte` work too since they're `unsigned char` as well. If anyone wants to post it as an answer I'll be glad to accept it! — Saliom, Jul 22 '21 at 12:32

Reading PCM audio file is giving sometimes wrong samples

0 Answers0