2

I have two audio files I read in using libsndfile.

SNDFILE* file1 = sf_open("D:\\audio1.wav", SFM_READ, &info);
SNDFILE* file2 = sf_open("D:\\audio2.wav", SFM_READ, &info2);

After I've done the previous I sample x-number of samples:

//Buffers that will hold the samples
short* buffer1 = new short[2 * sizeof(short) * 800000];
short* buffer2 = new short[2 * sizeof(short) * 800000];

// Read the samples using libsndfile
sf_readf_short(file1, buffer1, 800000);
sf_readf_short(file2, buffer2, 800000);

Now, I want to mix those two. I read that you need to get the left and right channel separately and then sum them up. I tried doing it like this:

short* mixdown = new short[channels * sizeof(short) * 800000];
for (int t = 0; t < 800000; ++t)
{
    mixdown[t] = buffer1[t] + buffer2[t] - ((buffer1[t]*buffer2[t]) / 65535);
    t++;
    mixdown[t] = buffer1[t] + buffer2[t] - ((buffer1[t]*buffer2[t]) / 65535);
}

After that I'm encoding the new audio using ffmpeg:

FILE* process2 = _popen("ffmpeg -y -f s16le -acodec pcm_s16le -ar 44100 -ac 2 -i - -f vob -ac 2 D:\\audioMixdown.wav", "wb");
fwrite(mixdown, 2 * sizeof(short) * 800000, 1, process2);

Now, the problem is that the audio from buffer1 sounds fine in the mixdown but the only thing "added" to the new audio is noise (like if it's an old audio recording) when I encode the mixdown to a file.

If I encode only one of the two to a file it works perfectly.

I have no idea why it's going wrong. I guess it has something to do with the mixing, obviously, but I don't know what I'm doing wrong. I got the mixing algorithm here but it doesn't give me the expected results.

I've also read other information on SO about people having similar questions but I couldn't figure it out with those.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
Dries
  • 995
  • 2
  • 16
  • 45
  • You're not being careful with stereo channels here - I hope that's because this is a simplified example. – MSalters Feb 24 '15 at 10:38
  • What do you mean? What should I change? (the for-loop in my question is now the one from the answer) – Dries Feb 24 '15 at 10:42
  • Well, you're not checking that audio1.wav and audio2.wav have the same number of channels. But it's reasonable to leave such checks out of a simplified example. – MSalters Feb 24 '15 at 11:05
  • Oh yes, I know that. I'm currently figuring out how I'm going to do that. I was thinking of converting the mono sounds to stereo so the mixing can stay the same. – Dries Feb 24 '15 at 11:09
  • 2
    @Dries The answer you accepted is wrong, and the article you linked even explains why. Your code looks fine, but you should be dividing `32768`, and not `65535`. That would explain some distortion. – ElderBug Feb 24 '15 at 11:21
  • 1
    @ElderBug Hmm... Let me compare the two ideas then. I think that "wrong" is a bit harsh. It does work for my example. – Dries Feb 24 '15 at 11:24
  • @Dries It indeed sounds correct, but, as the article explains, it halves the amplitude of one signal if the other is silent. – ElderBug Feb 24 '15 at 11:26
  • @ElderBug I changed the code and indeed it does sound good with my code as well. So, I guess I'll be keeping my code (with your change). It was interesting to see the easiest example though and I'm happy that I was actually almost right :) – Dries Feb 24 '15 at 11:28
  • @Dries: what is your use case ? If it's for high quality audio (e.g. music) then you probably don't want to add the non-linear term. If you're just mixing something like video game sound effects then the non-linear term will help to maintain dynamic range at the cost of some distortion. – Paul R Feb 24 '15 at 11:35
  • By the way, you're allocating twice as much memory as you need. `new T[N]` will allocate `N` objects, not `N` bytes, so there's no need to multiply by `sizeof(short)`. – Mike Seymour Feb 24 '15 at 11:47
  • @PaulR I need to mix both of your examples. sound effects as well as background music – Dries Feb 24 '15 at 12:04
  • 1
    @Dries: unfortunately there is no perfect solution for the mixing problem, assuming you need to do it "on the fly". If you can mix ahead of time then it's possible to normalise the input samples and maintain full dynamic range, but otherwise you have to compromise in one way or another. Much has been written on this subject over the years. – Paul R Feb 24 '15 at 12:43
  • 1
    @PaulR it sound okay for now. If there's a little bit of quality loss that's okay. Thank you for the insight on the subject I'm sure that I can continue developing with all the information I got – Dries Feb 24 '15 at 13:10

2 Answers2

2

Your mixing code is very odd - you seem to be adding a non-linear term which will result in distortion - it seems to be a hack specifically for 8 bit PCM where the dynamic range is very limited, but you probably don't need to worry about this for 16 bit PCM. For basic mixing you just want this:

for (int t = 0; t < 800000 * 2; ++t)
{
    mixdown[t] = (buffer1[t] + buffer2[t]) / 2;
}

Note that the divide by 2 is necessary to prevent distortion when you have two full scale signals. Note also that I've removed 2x loop unrolling.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • That hack probably isn't for PCM. I think the hack was created by someone who didn't know he was working on uLaw data, and who consequently tried a few simple arithmetic operations on the integer interpretation of a uLaw value. – MSalters Feb 24 '15 at 10:42
  • @MSalters: aha - yes - µLaw would make sense. – Paul R Feb 24 '15 at 10:44
  • 1
    The algorithm is perfectly valid for PCM, and your answer is wrong, because it induces amplitude distortion. OP's linked article explains why. – ElderBug Feb 24 '15 at 11:24
  • @Elderbug: it depends whether you want to preserve quality - by introducing a non-linear term you are introducing distortion. The advantage is that you get the full dynamic range when one sample is quiet, but at the cost of significant distortion. For low quality audio (e.g. mixing arcade game sounds) then this might be OK, but for mixing music etc one probably cares more about minimising distortion. – Paul R Feb 24 '15 at 11:28
  • The algorithm can work for both, but is slightly different for unsigned PCM. OP version is indeed for signed PCM. The article also contains the unsigned PCM version if you want to take a look. And it doesn't really induce distortion. – ElderBug Feb 24 '15 at 11:32
  • Sorry - I misinterpreted your comment and have now revised mine. I don't think there is necessarily a "right" or "wrong" here - it depends on whether you want to prioritise dynamic range over accuracy or *vice versa*. – Paul R Feb 24 '15 at 11:33
  • I didn't listen to the result, but I don't think it induces distortion, and OP seems to think so too after he listened. – ElderBug Feb 24 '15 at 11:37
  • @Elderbug: any non-linearity introduces distortion - whether you can hear it or not depends on the nature of the sources and how good your ears are. ;-) – Paul R Feb 24 '15 at 11:39
2

Your algorithm is correct, but you missed an important point : the range of your PCM is from -32768 to 32767. Thus you must divide by 32768, and not 65535.

ElderBug
  • 5,926
  • 16
  • 25
  • This should be the correct answer. The proposed answer below (dividing by 2) is going to make the resulting audio half volume in many cases (especially if one of the input is silence!). – Alexis Wilke Dec 19 '21 at 19:39