Working with 24-bit audio samples

Question

What is the "standard way" of working with 24-bit audio? Well, there are no 24-bit data types available, really. Here are the methods that come into my mind:

Represent 24-bit audio samples as 32-bit ints and ignore the upper eight bits.
Just like (1) but ignore the lower eight bits.
Represent 24-bit audio samples as 32-bit floats.
Represent the samples as structs of 3 bytes (acceptable for C/C++, but bad for Java).

How do you work this out?

marko · Accepted Answer · 2013-07-05T20:25:46.940

4

Store them them as 32- or 64-bit signed ints or float or double unless you are space conscious and care about packing them into the smallest space possible.

Audio samples often appear as 24-bits to and from audio hardware since this is commonly the resolution of the DACs and ADCs - although on most computer hardware, don't be surprised to find the bottom 3 of 4 bits banging away randomly with noise.

Digital signal processing operations - which is what usually happens downstream from the acquisition of samples - all involve addition of weighted sums of samples. A sample stored in an integer type can be considered to be fixed-point binary with an implied binary point at some arbitrary point - the position of which you can chose strategically to maintain as many bits of precision as possible.

For instance, the sum of two 24-bit integer yields a result of 25 bits. After 8 such additions, the 32-bit type would overflow and you would need to re-normalize by rounding and shifting right.

Therefore, if you're using integer types to store your samples, use the largest you can and start with the samples in the least significant 24 bits.

Floating point types of course take care of this detail for you, although you get less choice about when renormalisation takes place. They are the usual choice for audio processing where hardware support is available. A single precision float has a 24-bit mantissa, so can hold a 24-bit sample without loss of precision.

Usually floating point samples are stored in the range -1.0f < x < 1.0f.

edited Jul 05 '13 at 20:25

answered Jul 05 '13 at 13:59

marko

9,029
4
30
46

Thanks for your very thoughtful answer regarding my issue! If you were me, which audio sample representation would you prefer to use for audio processing in Java including FFT and equalization? 32-bit signed integers or 32-bit floats? – ezpresso Jul 05 '13 at 14:38
I don't think you have any choice other than `float` in this case. FFTs in integer arithmetic are very, very hard work. – marko Jul 05 '13 at 14:43
Thank you so much! Gotta go coding! – ezpresso Jul 05 '13 at 14:47
Suggested edit: "re-normalize" instead of "de-normalize". denormal is something completely different. – Bjorn Roche Jul 05 '13 at 18:18
"Therefore, if you're using integer types to store your samples, use the largest you can and start with the samples in the least significant 24 bits." This is not necessarily the case. You may want preserve precision rather than headroom. Depends on your operations. – Bjorn Roche Jul 05 '13 at 18:19
@BjornRoche By this I mean starting with the samples MSB aligned - the OP was asking precisely where to put 24-bit samples in a 32-bit word. Clearly for operations that result in wider results, it may be appropriate to move the binary point if word width permits. – marko Jul 05 '13 at 20:24

Working with 24-bit audio samples

1 Answers1