0

I’m trying to read a collection of audio files - recordings of people talking - into MATLAB. The files are in WebM format. When I try to read these files into MATLAB using the audioread() function, the resulting time series is missing pauses in speech; that is, periods in which the speaker is silent are cut out. I have tried to convert the WebM files to .mp3, .ogg, .opus, and .mp4 files (using VLC media player and ffmpeg) before reading them in with audioread(), but still the pauses are cut out. Is there a way of reading these audio files into MATLAB that retains the pauses?

EDIT:

The pauses exist when I play the file with VLC, but if I try to convert with VLC into any format, they aren't retained.

Code for importing webm into MATLAB:

ffmpegPath = 'C:\Users\rapiduser\Downloads\ffmpeg-6.0-essentials_build\ffmpeg-6.0-essentials_build\bin\ffmpeg.exe';  %ffmpeg exe path
listAudios = dir('Z:\data\files\*audio.webm');  %audio file paths array
nAudios = length(listAudios);
outputFolder = dir('Y:\IntermediateData\WAVFiles');  %wav file folder

for idxAud = 1:nAudios
    inputFile = strcat(listAudios(idxAud).folder, '\', listAudios(idxAud).name)  %input file path
    outputFile = strcat(outputFolder(1).folder, '\', listAudios(idxAud).name(1:end-5), '.wav')  %output file path
    command = strcat(ffmpegPath, " -i ", inputFile, " ", outputFile)  %command to run ffmpeg
    [~] = system(command)
end

Code to read the wav files:

listWav = dir('Y:\IntermediateData\WAVFiles\*.wav');  %wav files
nAudios = length(listWav);


for idxAud = 1:nAudios
    [y,Fs] = audioread(listWav(idxAud).folder + "\" + listWav(idxAud).name);  %reads single file
end

Jessica
  • 1
  • 1
  • Can you post the code you're using to load and play the files? Are you confident the pauses are actually in the recording? Are you able to play the audio with some other software to verify that there are pauses? – Brionius Jul 12 '23 at 22:10
  • @Brionius I made an edit to the original post answering your questions. Thanks! – Jessica Jul 13 '23 at 18:22
  • Hi Jessica - unfortunately I don't see anything that could cause this in the MATLAB code. I suspect there's something funky going on with how ffmpeg is converting these files, but I'm not sure why. If you're able to drop a link where I can download an example audio file, I'd be happy to take a look at it. Side note - there is a function called `fullfile` which is simpler and more reliable for generating paths than your `strcat` method, and a function called `fileparts` which you can use to reliably get filenames without extensions. – Brionius Jul 18 '23 at 16:32

0 Answers0