2

I wanted to know what kind of values Audio Units expects in the buffer. Right now when I calculate the samples for a sine wave on one single frequency with amplitude 0.5 everything works fine.

But when I want to play several frequencies at the same time, for example 5 at once and I mix the samples together by summing them up the sample values get higher and the sound isn't clean anymore.

So I want to know what my maximum sample value can be before I start getting a dirty sound.

  • You can normalize them, i.e. instead of plainly adding them you can perform a mean for each sample. – Matteo Italia Aug 24 '12 at 16:10
  • any examples on how this is done? –  Aug 24 '12 at 16:13
  • If in your code you are doing `out[i]=a[i]+b[i]+c[i]`, now you would do `out[i]=(a[i]+b[i]+c[i])/3`. – Matteo Italia Aug 24 '12 at 16:16
  • And this wouldn't cause any problems if for example I go from three to four notes and suddenly divide by 4? I mean, won't the sudden changes in amplitude be noticeable? –  Aug 24 '12 at 16:27
  • Sure, it would be noticeable, it all depends from what your objective is. If you want to add two sounds (as it would happen by playing two keys of the piano together) then a plain addition is the correct way; to avoid distorting, make sure to have the original amplitudes small enough not to exceed the dynamic range of the output. If instead you want to have the peaks not exceeding the original amplitude, do a mean. If you want to keep the perceived intensity of the output as the one of the inputs, divide by the square root of the number of sounds (although actually it's more complicated). – Matteo Italia Aug 24 '12 at 16:35
  • @Sled typically, you just provide osc/part and master output volume controls on a synth, and let the user figure this out. you're working in the floating point realm -- 'clipping' is not going to happen at that stage, unless something is *seriously* wrong. – justin Aug 24 '12 at 16:36
  • @MatteoItalia, how is the square root method called so I can look up some more info about it? –  Aug 24 '12 at 17:15

1 Answers1

5

On OS X

Sample values are typically within [-1.0...1.0), where the maximum and minimum correspond to 0 dBFS. However, you should be prepared to handle larger sample values.

Many people who work with floating point rendering/mixing graphs are accustomed to working without consideration of exceeding 0 dBFS. They may verify the signal does not exceed 0 dBFS only when they output to hardware or an audio file.

If you just have a synth which sums 5 sines, each at -6 dBFS, there should be no clipping of the signal under normal situations, even if you exceed [-1...1) because you are using floating point numbers to represent your signal.

there are a few exceptions to this:

  • you are using an unusual AU host which does not use a floating point mixer (i can't think of one which is actively developed)
  • or not bringing the output down before it hits the DAC
  • or not bringing the output down before it is saved to a file (although audio files can save in floating point too)
  • a component somewhere in the signal path does not process in floating point or support floating point inputs. this is common for dedicated hardware processors, but many 3rd party plugins process in floating point these days.

I will typically prove/disprove this by sending it a signal which would obviously clip. Of course, a signal processor/generator which opted to process using ints could (and should) leave a good amount of headroom to avoid clipping (because it's not likely that the processor processes audio using anything less than 32 bit).

On iOS

Because floating point processing is much slower on iOS devices, the canonic AudioUnitSampleType is specified as Q7.24 fixed point.

An explanation of this format can be found here. See also the posts surrounding this topic.

Because this is not floating point, you will have to much more careful about your gain stages to avoid internal clipping.

Also note that it is possible to configure a 32 bit float graph on iOS. In that case, you should avoid exceeding [-1.0...1.0) at the output of your processor because it is likely that your output will be converted to a non-floating point representation sooner than later (compared to OS X), unless of course you have direct control of the gain staging at suitable points downstream in the processing chain and you adjust the amplitude at those points appropriately.

justin
  • 104,054
  • 14
  • 179
  • 226
  • What exactly do you mean by bringing the output down before it hits DAC. Does that have anything to do with the amplitude? I'm just sending the sum of my samples to the buffer. My code is based on this example: http://cocoawithlove.com/2010/10/ios-tone-generator-introduction-to.html –  Aug 24 '12 at 17:06
  • @Sled that article is about AudioUnits on iOS -- completely different world from the desktop, the original/common/open AudioUnit. will have to update the answer. my answer is about AUs on OS X. – justin Aug 24 '12 at 17:06
  • Sorry, I should have mentioned that this is an iOS application but didn't think it would matter. –  Aug 24 '12 at 17:07
  • 1
    yeah, it's totally different in this regard bc AUs and fp processing on iOS are much slower - so they went with Q7.24 on iOS. as well, there is no real means to develop AUs on iOS so... i assumed OS X. np – justin Aug 24 '12 at 17:09
  • So if I keep the values between -1.0 and 1 I should be safe? Or does this depend on how AU is set up? Thanks for all the info! –  Aug 24 '12 at 18:27
  • 1
    @Sled technically, 1.0 would clip. just imagine the highest value as one less of the destination sample format as integer (whatever that width may be). for example: -1 maps to -128 if converting from float to 8 bit -- similarly, +1 maps to +128. +127 is the highest a signed 8 bit value represents, so +128 would be a clip. – justin Aug 24 '12 at 19:03