1

I'm trying to isolate the foreground of an audio stream and then save it as a standalone audio stream using librosa.

Starting with this seemingly relevant example.

I have the full, foreground and background data isolated as the example does in S_full, S_foreground and S_background but I'm unsure as to what to do to use those as audio.

I attempted to use librosa.istft(...) to convert those and then save that as a .wav file using soundfile.write(...) but I'm left with a file of roughly the right size but unusable(?) data.

Can anyone describe or point me at an example?

Thanks.

Hendrik
  • 5,085
  • 24
  • 56
user9548
  • 33
  • 1
  • 1
  • 7
  • Please add relevant code that you have tried and ask specific question related to that code. Asking for opinion will likely get this question closed. – Anil_M Nov 26 '19 at 23:30
  • `istft()` is the right way to go. How do you call it? Please post a [mcve]. – Hendrik Nov 27 '19 at 07:27

1 Answers1

2

in putting together the minimal example, istft() with the original sampling rate does in fact work.

I'll find my bug, somewhere. FWIW here's the working code

import numpy as np
import librosa
from librosa import display
import soundfile
import matplotlib.pyplot as plt

y, sr = librosa.load('audio/rb-testspeech.mp3', duration=5)
S_full, phase = librosa.magphase(librosa.stft(y))

S_filter = librosa.decompose.nn_filter(S_full,
                                       aggregate=np.median,
                                       metric='cosine',
                                       width=int(librosa.time_to_frames(2, sr=sr)))
S_filter = np.minimum(S_full, S_filter)

margin_i, margin_v = 2, 10
power = 2

mask_v = librosa.util.softmask(S_full - S_filter,
                               margin_v * S_filter,
                               power=power)

S_foreground = mask_v * S_full

full = librosa.amplitude_to_db(S_full, ref=np.max)
librosa.display.specshow(full, y_axis='log', sr=sr)

plt.title('Full spectrum')
plt.colorbar()

plt.tight_layout()
plt.show()

print("y({}): {}".format(len(y),y))
print("sr: {}".format(sr))

full_audio = librosa.istft(S_full)
foreground_audio = librosa.istft(S_foreground)
print("full({}): {}".format(len(full_audio), full_audio))

soundfile.write('orig.WAV', y, sr) 
soundfile.write('full.WAV', full_audio, sr) 
soundfile.write('foreground.WAV', foreground_audio, sr) 
user9548
  • 33
  • 1
  • 1
  • 7