I am extracting MFCC features from mp3 voice files but I do want to keep the source files unchangeable and without adding any new files. My processing includes the following steps:
- Load .mp3 file, eliminate silence, and generate .wav data using
pydub
- Read audio data and rate using
scipy.io.wavfile.read()
- Extract features using
python_speech_features
However, eliminate_silence()
returns an AudioSegment
object, whereas the scipy.io.wavfile.read()
accepts a .wav
filename and so I am forced to temporarily save/export the data as wave to ensure the transition in between. This step is memory and time consuming and so my question is: How can I avoid the export wave file step? or is there a workaround for it?
Here is my code.
import os
from pydub import AudioSegment
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc
from pydub.silence import split_on_silence
def eliminate_silence(input_path):
""" Eliminate silent chunks from original call recording """
# Import input wave file
sound = AudioSegment.from_mp3(input_path)
chunks = split_on_silence(sound,
# split on silences longer than 1000ms (1 sec)
min_silence_len=500,
# anything under -16 dBFS is considered silence
silence_thresh=-30,
# keep 200 ms of leading/trailing silence
keep_silence=100)
output_chunks = AudioSegment.empty()
for chunk in chunks: output_chunks += chunk
return output_chunks
silence_clear_data = eliminate_silence("file.mp3")
silence_clear_data.export("temp.wav", format="wav")
rate, audio = read("temp.wav")
os.remove("temp.wav")
# Extract MFCCs
mfcc_feature = mfcc(audio, rate, winlen = 0.025, winstep = 0.01, numcep = 15,
nfilt = 35, nfft = 512, appendEnergy = True)
mfcc_feature = preprocessing.scale(mfcc_feature)