How to use MFCCs in Weka for audio classification?

Question

I am trying to develop a method to classify audio using MFCCs in Weka. The MFCCs I have are generated with a buffer size of 1024, so there is a series of MFCC coefficients for each audio recording. I want to convert these coefficients into the ARFF data format for Weka, but I'm not sure how to approach this problem.

I also asked a question about merging the data as well because I feel like this may affect the data conversion to ARFF format.

I know that for an ARFF the data needs to be listed through attributes. Should each coefficient of the MFCC be a separate attribute or an array of the coefficients as a single attribute? Should each data represent a single MFCC, a window of time, or the entire file or sound? Below, I wrote out what I think it should look like if it only took one MFCC into account, which I don't think would be able to classify an entire sound.

@relation audio

@attribute mfcc1 real
@attribute mfcc2 real
@attribute mfcc3 real
@attribute mfcc4 real
@attribute mfcc5 real
@attribute mfcc6 real
@attribute mfcc7 real
@attribute mfcc8 real
@attribute mfcc9 real
@attribute mfcc10 real
@attribute mfcc11 real
@attribute mfcc12 real
@attribute mfcc13 real
@attribute class {bark, honk, talking, wind}

@data
126.347275, -9.709645, 4.2038302, -11.606304, -2.4174862, -3.703139, 12.748064, -5.297932, -1.3114156, 2.1852574, -2.1628475, -3.622149, 5.851326, bark

Any help will be greatly appreciated.

Edit: I have generated some ARFF files using Weka using openSMILE following a method from this website, but I am not sure how this data would be used to classify the audio because each row of data is 10 milliseconds of audio from the same file. The name attribute of each row is "unknown," which I assume is the attribute that the data would try to classify. How would I be able to classify an overall sound (rather than 10 milliseconds) and compare this to several other overall sounds?

Edit #2: Success!

After more thoroughly reading the website that I found, I saw the Accumulate script and Test and Train data files. The accumulate script put all files generated each set of MFCC data from separate audio files together into one ARFF file. Their file was composed of about 200 attributes with stats for 12 MFCCs. Although I wasn't able to retrieve these stats using OpenSmile, I used Python libraries to do so. The stats were max, min, kurtosis, range, standard deviation, and so on. I accurately classified my audio files using BayesNet and Multilayer Perceptron in Weka, which both yielded 100% accuracy for me.

Tomer Aberbach · Accepted Answer · 2017-07-20T22:01:44.210

I don't know much about MFCCs, but if you are trying to classify audio files then each line under @data must represent one audio file. If you used windows of time or only one MFCC for each line under @data then the Weka classifiers would be trying to classify windows of time or MFCCs, which is not what you want. Just in case you are unfamiliar with the format (just linking because I saw you put the features of an audio file on the same line as @data), here is an example where each line represents an Iris Plant:

% 1. Title: Iris Plants Database
% 
% 2. Sources:
%      (a) Creator: R.A. Fisher
%      (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
%      (c) Date: July, 1988
% 
@RELATION iris

@ATTRIBUTE sepallength  NUMERIC
@ATTRIBUTE sepalwidth   NUMERIC
@ATTRIBUTE petallength  NUMERIC
@ATTRIBUTE petalwidth   NUMERIC
@ATTRIBUTE class        {Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa

In terms of addressing your question on what attributes you should use for your audio file, it sounds (no pun intended) like using the MFCC coefficients could work (assuming every audio file has the same number of MFCCs because every piece data/audio file must have the same number of attributes). I would try it out and see how it goes.

EDIT: If the audio files are not the same size you could:

Cut audio files longer than the shortest audio short. Basically you'd be throwing away the data at the end of the audio files.
Make the number of attributes high enough to fit the longest audio file and put whatever MFCC coefficients represent silence for the unfilled attributes of audio files which are shorted than the longest audio file.
If MFCC values are always within a certain range (e.g. -10 to 10 or something like that) then maybe use a "bag of words" model. Your attributes would represent the number of times an MFCC coefficient falls within a certain range for an audio file. So the first attribute might represent the number of MFCC coefficients which fall between -10 and -9.95, the second attribute, -9.95 to -9.90. So if you had a very short audio file with two MFCCs (not likely, just for example purposes) and one coefficient was 10 and the other was -9.93 then your last attribute would have a value of 1, your second attribute would have a value of 1, but all other attributes would have a value of 0. The downside to this method is that the order of the MFCC coefficients is not taken into account. However, this method works well for text classification even though word order is ignored so who knows, maybe it would work for audio.
Other than that I would see if you get any good answers on your merging question.

Thank you for the post. I edited the format of the data. Unfortunately, the number of MFCCs does vary between each audio file and I'm not sure how it would encompass the entire file or sound that I want to classify. Each MFCC has 13 coefficients of various frequencies based on a frame which I believe is 23.27 milliseconds. I don't understand how this would be compared to other audio. — CCCodes, Jul 20 '17 at 21:21
I made some edits to my answer. Sorry I cannot help more than this. — Tomer Aberbach, Jul 20 '17 at 22:02
Thanks for the edit, I try making some changes and see if it works out. — CCCodes, Jul 21 '17 at 14:53

How to use MFCCs in Weka for audio classification?

1 Answers1