8

I have raw binary int16 data that I am converting to a numpy array using

audio = np.fromstring(raw_data, dtype=np.int16)

The data is audio data. When I convert the data to float32, the audio is getting distorted:

audio = audio.astype(np.float32, order='C')

I'm saving the audio to disk to listen to it using SoundFile:

soundfile.write('out.wav', audio, sample_rate)

If I write the audio directly to disk without doing the astype operation, there is no distortion (ie);

# no distortion
audio = np.fromstring(raw_data, dtype=np.int16)
soundfile.write('out.wav', audio, sample_rate)

# distortion
audio = np.fromstring(raw_data, dtype=np.int16)
audio = audio.astype(np.float32, order='C')
soundfile.write('out.wav', audio, sample_rate)

What is the proper way to convert the data type here?

deef
  • 4,410
  • 2
  • 17
  • 21

1 Answers1

15

By convention, floating point audio data is normalized to the range of [-1.0,1.0] which you can do by scaling:

audio = audio.astype(np.float32, order='C') / 32768.0

This may fix the problem for you but you need to make sure that soundfile.write writes a wav header that indicates float32. It may do that automatically based on the dtype of the array.

jaket
  • 9,140
  • 2
  • 25
  • 44
  • the signed int range spans from -32768 to 32767. We don't have to balance the range somehow during conversion? Or we can assume that 0 is true zero and the signed int representation is simply unbalanced (can represent -32768 but no 32767)? – fodma1 Mar 10 '19 at 17:07
  • 1
    @fodma1. Yes, 0 is the center. I never write out a file with -32768 but when reading I don't like to make the assumption that someone else hasn't, hence the /32768.0. It's only 0.00026 dB of difference which is not worth the chance of overflowing. – jaket Mar 11 '19 at 04:27
  • 1
    There are two proofs of this method: 1) FFmpeg [source](https://github.com/FFmpeg/FFmpeg/blob/4fda451c9f2dda4ced8cff92cd7c5387550dad83/libavcodec/pcm.c#L278): `s->scale = 1. / (1 << (avctx->bits_per_coded_sample - 1));` 2) soundfile test: `numpy.array_equal(soundfile.read('file.wav', dtype='int16')[0] / 32768, soundfile.read('file.wav', dtype='float32')[0])` – bartolo-otrit Aug 13 '21 at 14:42
  • @jaket what if I have numpy array of audio file and I want to detect words from it?? – Manvi Jan 18 '23 at 05:52