Isolating audio foreground and converting back to audio stream using librosa

Question

I'm trying to isolate the foreground of an audio stream and then save it as a standalone audio stream using librosa.

Starting with this seemingly relevant example.

I have the full, foreground and background data isolated as the example does in S_full, S_foreground and S_background but I'm unsure as to what to do to use those as audio.

I attempted to use librosa.istft(...) to convert those and then save that as a .wav file using soundfile.write(...) but I'm left with a file of roughly the right size but unusable(?) data.

Can anyone describe or point me at an example?

Thanks.

Please add relevant code that you have tried and ask specific question related to that code. Asking for opinion will likely get this question closed. — Anil_M, Nov 26 '19 at 23:30
`istft()` is the right way to go. How do you call it? Please post a [mcve]. — Hendrik, Nov 27 '19 at 07:27

score 2 · Answer 1 · answered Nov 27 '19 at 13:36

in putting together the minimal example, istft() with the original sampling rate does in fact work.

I'll find my bug, somewhere. FWIW here's the working code

import numpy as np
import librosa
from librosa import display
import soundfile
import matplotlib.pyplot as plt

y, sr = librosa.load('audio/rb-testspeech.mp3', duration=5)
S_full, phase = librosa.magphase(librosa.stft(y))

S_filter = librosa.decompose.nn_filter(S_full,
                                       aggregate=np.median,
                                       metric='cosine',
                                       width=int(librosa.time_to_frames(2, sr=sr)))
S_filter = np.minimum(S_full, S_filter)

margin_i, margin_v = 2, 10
power = 2

mask_v = librosa.util.softmask(S_full - S_filter,
                               margin_v * S_filter,
                               power=power)

S_foreground = mask_v * S_full

full = librosa.amplitude_to_db(S_full, ref=np.max)
librosa.display.specshow(full, y_axis='log', sr=sr)

plt.title('Full spectrum')
plt.colorbar()

plt.tight_layout()
plt.show()

print("y({}): {}".format(len(y),y))
print("sr: {}".format(sr))

full_audio = librosa.istft(S_full)
foreground_audio = librosa.istft(S_foreground)
print("full({}): {}".format(len(full_audio), full_audio))

soundfile.write('orig.WAV', y, sr) 
soundfile.write('full.WAV', full_audio, sr) 
soundfile.write('foreground.WAV', foreground_audio, sr)

Isolating audio foreground and converting back to audio stream using librosa

1 Answers1