0

I am processing an audio file with librosa as:

import librosa
import soundfile as sf

y,sr = librosa.cora.load('test.wav', sr=22050)
y_processed = some_processing(y)
sf.write('test_processed.wav', y_processed , sr)
y_read = librosa.cora.load('test_processed.wav', sr=22050)

Now the issue is that y_processed and y_read do not match. My understanding is that this comes from some encoding done by soundfile library. Why is this happening and how can I get from y_processed to y_read without saving?

Kate
  • 49
  • 1
  • 8
  • Why are you saving at all? – Jon Nordby Apr 06 '22 at 09:29
  • The processing-saving and reading are in different parts of the project but using the same data. Now I needed to merge these parts into one tool but I have the model already trained using this saved-read data, so now I am trying to figure how to avoid saving and reading to get the correct data – Kate Apr 06 '22 at 19:59
  • what version of librosa and soundfile are you using? the new librosa version not using core.load, see: https://librosa.org/doc/0.9.1/generated/librosa.load.html – Kings85 Apr 10 '22 at 15:11
  • also, can you post an example of the different values (maybe first 10 elements of each) are you sure the original sr is 22050 ? – Kings85 Apr 10 '22 at 15:29

1 Answers1

0

According to this article, librosa.load(), along with other things, normalizes the bit depth between -1 and 1.

I experienced the same problem as you did, where the min and max values of the "loaded" signal were much closer to each other.

Since I don't exactly how your data differs from each other, this may not help you, but this has helped me.

y_processed_buf = librosa.util.buf_to_float(y_processed)

This seems to be the culprit, which would normalizes your values (source code). It is also called during librosa.load(), which is how I stumbled over it.

Gnoosh
  • 16
  • 1
  • 2