I am trying to develop a method to classify audio using MFCCs in Weka. The MFCCs I have are generated with a buffer size of 1024, so there is a series of MFCC coefficients for each audio recording. I want to convert these coefficients into the ARFF data format for Weka, but I'm not sure how to approach this problem.
I also asked a question about merging the data as well because I feel like this may affect the data conversion to ARFF format.
I know that for an ARFF the data needs to be listed through attributes. Should each coefficient of the MFCC be a separate attribute or an array of the coefficients as a single attribute? Should each data represent a single MFCC, a window of time, or the entire file or sound? Below, I wrote out what I think it should look like if it only took one MFCC into account, which I don't think would be able to classify an entire sound.
@relation audio
@attribute mfcc1 real
@attribute mfcc2 real
@attribute mfcc3 real
@attribute mfcc4 real
@attribute mfcc5 real
@attribute mfcc6 real
@attribute mfcc7 real
@attribute mfcc8 real
@attribute mfcc9 real
@attribute mfcc10 real
@attribute mfcc11 real
@attribute mfcc12 real
@attribute mfcc13 real
@attribute class {bark, honk, talking, wind}
@data
126.347275, -9.709645, 4.2038302, -11.606304, -2.4174862, -3.703139, 12.748064, -5.297932, -1.3114156, 2.1852574, -2.1628475, -3.622149, 5.851326, bark
Any help will be greatly appreciated.
Edit: I have generated some ARFF files using Weka using openSMILE following a method from this website, but I am not sure how this data would be used to classify the audio because each row of data is 10 milliseconds of audio from the same file. The name attribute of each row is "unknown," which I assume is the attribute that the data would try to classify. How would I be able to classify an overall sound (rather than 10 milliseconds) and compare this to several other overall sounds?
Edit #2: Success!
After more thoroughly reading the website that I found, I saw the Accumulate script and Test and Train data files. The accumulate script put all files generated each set of MFCC data from separate audio files together into one ARFF file. Their file was composed of about 200 attributes with stats for 12 MFCCs. Although I wasn't able to retrieve these stats using OpenSmile, I used Python libraries to do so. The stats were max, min, kurtosis, range, standard deviation, and so on. I accurately classified my audio files using BayesNet and Multilayer Perceptron in Weka, which both yielded 100% accuracy for me.