How does Audacity mix audio samples?

Question

So let's say I want to mix these 2 audio tracks:

Unmixed

In Audacity, I can use the "Mix and Render" option to mix them together, and I'll get this:

Audacity Mix

However, when I try to write my own code to mix, I get this:

My Mix

This is essentially how I mix the samples:

private function mixSamples(sample1:UInt, sample2:UInt):UInt
{
    return (sample1 + sample2) & 0xFF;
}

(The syntax is Haxe but it should be easy to follow if you don't know it.)

These are 8-bit sample audio files, and I want the product to be 8-bit as well, hence the & 0xFF.

I do understand that by simply adding the samples, I should expect clipping. My issue is that mixing in Audacity doesn't cause clipping (at least not to the extent that my code does), and by looking at the "tail" of the second (longer) track, it doesn't seem to reduce the amplitude. It doesn't sound any softer either.

So basically, my question is this: what's Audacity doing that I'm not? I want to mix tracks to sound exactly as if they're being played on top of one another, but I (obviously) don't want this horrendous clipping.

EDIT:

Here is what I get if I sign the values before I add, then unsign the sum value, as suggested by Radiodef:

My Signed Mix

As you can see it's much better than before, but is still quite distorted and noisy compared to the result Audacity produces. So my problem still stands, Audacity must be doing something differently.

EDIT2:

I mixed the first track on itself, both with my code and Audacity, and compared the points where distortion occurs. This is Audacity's result:

Zoom Audacity

And this is my result:

enter image description here

Just based on the screenshots, it appears that they're multiplied together, not added. — ashes999, Nov 21 '13 at 01:08
This looks more freakish than clipping. Look at how everywhere the shorter clip is summed the audio is totally destroyed and then it's totally unaffected after. Are you sure your 8-bit samples are not scaled up when they are read in? Try taking out the & and see what happens. — Radiodef, Nov 21 '13 at 01:17
@ashes999: I'm not sure which you're talking about, but I can assure you that mine were added (the cause of the major distortion is that they were unsigned, as Radiodef pointed out). As for the Audacity mixing, the Audacity manual itself states "the act of mixing multiple tracks _adds_ the waveforms together": http://manual.audacityteam.org/man/Mixing — puggsoy, Nov 21 '13 at 06:22

Radiodef · Accepted Answer · 2013-11-22T03:27:00.597

I think what is happening is you are summing them as unsigned. A typical sound wave is both positive and negative which is why they add together the way they do (some parts cancel). If you have some 8-bit sample that is -96 and another that is 96 and you sum them you will get 0. If what you have is unsigned audio you will instead have the samples 32 and 224 summed = 256 (offset and overflow).

What you need to do is sign them before summing. To sign 8-bit samples convert them to a signed int type and subtract 128 from all of them. I assume what you have are WAV files and you will need to unsign them again after the sum.

Audacity probably does floating point processing. I've heard some real dubious claims about floating point like that it has "infinite dynamic range" and garbage like that but it doesn't clip in the same determinate and obvious way as integers do. Floating point has a finite range of values same as integers but the largest and smallest values are much farther apart. (That's about the simplest way to put it.) Floating point can allow much greater amplitude changes in the audio but the catch is the overall signal to noise ratio is lower than integers.

With the weird distortion my best guess is it is from the mask you are doing with & 0xFF. If you want to actually clip instead of getting overflow you will need to do so yourself.

for (int i = 0; i < samplesLength; i++) {
    if (samples[i] > 127) {
        samples[i] = 127;
    } else if (samples[i] < -128) {
        samples[i] = -128;
    }
}

Otherwise say you have two samples that are 125, summing gets you 250 (11111010). Then you unsign (add 128) and get 378 (101111010). An & will get you 1111010 which is 122. Other numbers might get you results that are effectively negative or close to 0.

If you want to clip at something other than 8-bit, full scale for a bit depth n will be positive (2 ^ (n - 1)) - 1 and negative 2 ^ (n - 1) so for example 32767 and -32768 for 16-bit.

Another thing you can do instead of clipping is to search for clipping and normalize. Something like:

double[] normalize(double[] samples, int length, int destBits) {

    double fsNeg = -pow(2, destBits - 1);
    double fsPos = -fsNeg - 1;

    double peak = 0;
    double norm = 1;

    for (int i = 0; i < length; i++) {
        // find highest clip if there is one

        if (samples[i] < fsNeg || samples[i] > fsPos) {
            norm = abs(samples[i]);

            if (norm > peak) {
                norm = peak;
            }
        }
    }

    if (peak != 0) {

        // ratio to reduce to where there is not a clip
        norm = -fsNeg / peak;

        for (int i = 0; i < length; i++) {
            samples[i] *= norm;
        }
    }

    return samples;
}

Ah, that makes perfect sense, kinda stupid of me not to realise that :P Still, after using this method (sign them, add, unsign) I still get quite a noisy waveform, especially near the start. It's not horribly clipping, but it's still significantly scratchy and unpleasant. I'll edit the question with a screenshot. — puggsoy, Nov 21 '13 at 06:23
"Scratchy" sounds like a description of quantization noise. That is probably due to the 8-bit although in my experience the quantization error of 8-bit is usually not all that noticeable. It depends on how low RMS your original signal is. If 8-bit is necessary for some reason my advice is to use a higher bit depth and only quantize at the very end. Only quantizing at the end will minimize the error because it won't compound through intermediate operations. For the record that is probably what Audacity does. No serious audio application will do any DSP at the source bit depth. — Radiodef, Nov 21 '13 at 20:34
Unfortunately that doesn't seem to help, I get the same results. I even converted them to 32-bit integers (multiplying by 0xFFFFFF is how that's done right?) and added them together, and just wrote them to a 32-bit WAV, still the same thing. Converting them to floats between 1.0 and -1.0 before adding, then changing back, doesn't work either. — puggsoy, Nov 22 '13 at 01:25
Integer addition won't incur any error. It's hard to say why you are getting distortion unless you're doing something else like applying gain (multiplication). But what you would need to do is start with the higher bit depth, converting them won't do you any good for the sum. The problem would be that 8-bit is noisy to begin with and the noise between the two signals will sum. So you need to start with a higher bit depth so the noise is very small. — Radiodef, Nov 22 '13 at 01:56
So I added the first wave onto itself and noticed that I only get distortions near the start, where the values get quite loud. I took a better look, and compared with the same mixture done in Audacity. At points where it should clip, the values fly to the other end of the scale. For example where it would usually clip past 1.0, it'll instead give me something like 0.9. That's what it looks like anyway. I'll add screenshots to the question. — puggsoy, Nov 22 '13 at 02:48
It could be a clip then. Remember since you are doing an & the numbers won't clip off, instead you get overflow. — Radiodef, Nov 22 '13 at 02:52
Ah, that finally did it, thank you so much!!! Again this seems so trivial in hindsight, but I'm still quite inexperienced with manipulating audio. This is really interesting stuff though so I'm eager to discover more. Thanks again! — puggsoy, Nov 22 '13 at 03:32

score 1 · Answer 2 · answered Nov 22 '13 at 03:11

It's a lot simpler than you think; although your original files are 8-bit, Audacity handles them internally as 32-bit floating point. You can see this in the screenshot, in the information panel to the left of each track. This means that adding 2 tracks together means adding two floating point samples at each point, and will simply yield sample values from -2.0 to +2.0, which are then clamped to the -1 to +1 range. By comparison, adding two 8-bit integers together will yield another 8-bit number where the value overflows and wraps around. (This can apply whether you use signed or unsigned values.)

I did notice that, yeah. Thing is, I'm using Haxe and all integers are 32-bit, so I figured that adding two 8-bit integers shouldn't be an issue. As Radiodef pointed out though, masking it with 0xFF causes it to wrap around, which I didn't realise. — puggsoy, Nov 22 '13 at 03:37

How does Audacity mix audio samples?

2 Answers2