I'm working on a project to classify 30 second samples of audio from 5 different genres (rock, electronic, rap, country, jazz). My dataset consists of 600 songs, exactly 120 for each genre. The features are a 1D array of 13 mfccs for each song and the labels are the genres. Essentially I take the mean of each set of 13 mfccs for each frame of the 30 second sample. This leads to 13 mfccs for each song. I then get the entire dataset, and use sklearn's scaling function.
My goal is to compare svm, knearest, and naive bayes classifiers (using the sklearn toolset). I have done some testing already but I've noticed that results vary depending on whether I do random sampling/do stratified sampling.
I do the following function in sklearn to get training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0, stratify=y)
It has the parameters "random state" and "stratify". When "random state" is ommitted, it randomly samples from the entire dataset; when it is set to 0, the training and test sets are guaranteed to be the same each time.
My question is, how do I appropriately compare different classifiers. I assume I should make the same identical call to this function before training and testing each classifer. My suspicion is that I should be handing the exact same split to each classifier, so it should not be random sampling, and stratifying as well.
Or should I be stratifying (and random sampling)?