Soundfile imports audio in two different formats

Question

I am attempting to preprocess audiofiles to be used in a neural net with soundfile.read(), but the function is formatting the returned data differently for different .FLAC files with the same sample rate and length. For example, calling data, sr = soundfile.read(audiofile1) produced an array with shape data.shape = (48000, 2) (where individual element values were either the amplitude, 0, or the negative amplitude in NumPy float64), while calling data, sr = soundfile.read(audiofile2) produced an array with shape data.shape = (48000,) (where individual element values were varied NumPy float64).

Also, if it helps, audiofile1 was a recording taken from a recording taken via PyAudio, whereas audiofile2 was a sample from the LibriSpeech corpus.

So, my question is twofold:

Why is soundfile.read() producing two different data formats, and how do I ensure that the function returns the arrays in the same format in the future?

score 0 · Accepted Answer · answered Jul 15 '20 at 02:10

Your audiofile2 sample is mono, whereas your audiofile1 recording is stereo (i.e. you probably recorded it with a PyAudio stream configured with channels=2). So I suggest you first figure out whether you need mono or stereo for your application.

If all you really care is a mono audio signal, you can convert stereo (or more generally N-channel) audio to mono by averaging the channels:

data, sr = soundfile.read(audiofile)
if np.dim(data)>1:
  data = np.mean(data,axis=1)

If you need stereo audio, then you may create an additional channel by duplicating the one you have (although that would not be adding the usual additional information such as phase or amplitude differences between the different channels) with:

if np.dim(data)<2:
  data = np.tile(data,(2,1)).transpose()

score 0 · Answer 2 · answered Jul 15 '20 at 06:28

0

It's as simple as:

data, sr = soundfile.read(audiofile2, always_2d=True)

With this, data.shape will always have two elements; data.shape[0] will be the number of frames and data.shape[1] will be the number of channels.

answered Jul 15 '20 at 06:28

Matthias

4,524
2
31
50

Soundfile imports audio in two different formats

2 Answers2