1

I'm training a Python audio source separation model package called DeWave (https://github.com/chaodengusc/DeWave). It's trained on single-channel .wav files. After training the model, I did inference on a .wav sample (to separate the two speaker sources in the single-channel audio test file). This works fine, except if I cut the .wav file, in which case I get an error from librosa stating that the audio buffer is not finite everywhere.

I've tried to do inference on different audio files, and the error only occurs if I use an external software to cut the .wav file (I've tried cutting with sox and Zamzar). The audio files I've done inference on successfully have all different lengths and are not multiples of a given length, so I don't believe it's a length issue. I'm wondering if the file cutting erases a buffer, but I'm not familiar with buffers in general, so any insight would be appreciated on how to remedy this.

The main code that writes with librosa are these lines from https://github.com/chaodengusc/DeWave/blob/master/DeWave/infer.py

## restore the original audio
len1 = len(out_audio1) // 3
len2 = len(out_audio2) // 3
source1 = out_audio1[len1:2*len1]
source2 = out_audio2[len2:2*len2]
librosa.output.write_wav(input_file[0:-4]+"_source1.wav", source1, SAMPLING_RATE)
librosa.output.write_wav(input_file[0:-4]+"_source2.wav", source2, SAMPLING_RATE)
return [(source1, SAMPLING_RATE), (source2, SAMPLING_RATE)]

The expected output would be two separate .wav files of the same length with one speaker in each file, and silence where the other speaker is speaking. However, I get this error:

Traceback (most recent call last):
  File "/home/<me>/anaconda3/bin/dewave-infer", line 11, in <module>
    sys.exit(infer())
  File "/home/<me>/anaconda3/lib/python3.6/site-packages/DeWave/cmdinfer.py", line 12, in infer
    blind_source_separation(args.input_file, args.model_dir)
  File "/home/<me>/anaconda3/lib/python3.6/site-packages/DeWave/infer.py", line 207, in blind_source_separation
    librosa.output.write_wav(input_file[0:-4]+"_source1.wav", source1, SAMPLING_RATE)
  File "<decorator-gen-6>", line 2, in write_wav
  File "/home/<me>/anaconda3/lib/python3.6/site-packages/librosa/util/decorators.py", line 58, in __wrapper
    return func(*args, **kwargs)
  File "/home/<me>/anaconda3/lib/python3.6/site-packages/librosa/output.py", line 239, in write_wav
    util.valid_audio(y, mono=False)
  File "/home/<me>/anaconda3/lib/python3.6/site-packages/librosa/util/utils.py", line 171, in valid_audio
    raise ParameterError('Audio buffer is not finite everywhere')
librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere
Anwarvic
  • 12,156
  • 4
  • 49
  • 69
Rachel
  • 13
  • 1
  • 5
  • Perhaps the model requires files to be a certain minimum length? Or it needs to be a multiple of a certain length? This question might be more suitable to file as an issue in the DeWave repository, as solving it likely requires intimate understanding of that particular software. – Jon Nordby Jul 25 '19 at 13:21
  • Thanks for your comment - I've edited the question a bit. Unfortunately, I can't open an issue since the repo is a fork, and I don't believe the source of the issue is within the model itself. However, I have been in contact with the creator - I'll update if I hear anything. – Rachel Jul 25 '19 at 15:18
  • Note that there are many data formats possible with a WAV file. 16 bit integer, 32 bit float etc. Might want to check what kind of data is in broken VS working input files – Jon Nordby Jul 25 '19 at 16:50

1 Answers1

10

I know I'm about three months late, but my answer could be helpful to other people. The reason -in my case- was that there are some nan values within the audio data. That's why librosa throws Audio buffer is not finite everywhere.

I have created this simple code to explain what I mean:

>>> import librosa
>>> import numpy as np

>>> f = 500 # frequence in Hz
>>> sr = 16000 # sample rate in bit/sec
>>> t = 2  #time in seconds

>>> samples = np.linspace(0, t, int(sr*t), endpoint=False)
>>> wav = np.sin(2 * np.pi * f * samples)
>>> librosa.output.write_wav('beeb.wav', wav, sr)
# works fine

The previous code snippet will create a beeb sound for two seconds. This was done using a sine wave with 500 HZ frequency and 16k sample rate. The previous code should work fine with no errors.

Now, I will append a nan value to the wav to re-produce the same error:

>>> wav = np.append(wav, np.nan) 
>>> librosa.output.write_wav('beeb2.wav', wav, sr)
Traceback (most recent call last):
  File "/home/anwar/Desktop/mayhem.py", line 10, in <module>
    librosa.output.write_wav('beeb2.wav', wav, sr)
  File "</media/anwar/E/ASR/Deep-Speech/lib/python3.7/site-packages/decorator.py:decorator-gen-10>", line 2, in write_wav
  File "/media/anwar/E/ASR/Deep-Speech/lib/python3.7/site-packages/librosa/util/decorators.py", line 58, in __wrapper
    return func(*args, **kwargs)
  File "/media/anwar/E/ASR/Deep-Speech/lib/python3.7/site-packages/librosa/output.py", line 239, in write_wav
    util.valid_audio(y, mono=False)
  File "/media/anwar/E/ASR/Deep-Speech/lib/python3.7/site-packages/librosa/util/utils.py", line 275, in valid_audio
    raise ParameterError('Audio buffer is not finite everywhere')
librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

As we can see, librosa throws the same error as before. The most common reason that produce nan values in your data is changing the audio datatype from np.float to np.int. So, maybe that's the reason!!

Anwarvic
  • 12,156
  • 4
  • 49
  • 69