Music genre classification with sklearn: how to accurately evaluate different models

Question

I'm working on a project to classify 30 second samples of audio from 5 different genres (rock, electronic, rap, country, jazz). My dataset consists of 600 songs, exactly 120 for each genre. The features are a 1D array of 13 mfccs for each song and the labels are the genres. Essentially I take the mean of each set of 13 mfccs for each frame of the 30 second sample. This leads to 13 mfccs for each song. I then get the entire dataset, and use sklearn's scaling function.

My goal is to compare svm, knearest, and naive bayes classifiers (using the sklearn toolset). I have done some testing already but I've noticed that results vary depending on whether I do random sampling/do stratified sampling.

I do the following function in sklearn to get training and testing sets:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0, stratify=y)

It has the parameters "random state" and "stratify". When "random state" is ommitted, it randomly samples from the entire dataset; when it is set to 0, the training and test sets are guaranteed to be the same each time.

My question is, how do I appropriately compare different classifiers. I assume I should make the same identical call to this function before training and testing each classifer. My suspicion is that I should be handing the exact same split to each classifier, so it should not be random sampling, and stratifying as well.

Or should I be stratifying (and random sampling)?

score 0 · Answer 1 · answered May 15 '17 at 06:25

To evaluate a classifier's accuracy against another classifier, you need to randomly sample from the dataset for training and test. Use the test dataset to evaluate each classifier and compare the accuracy in one go.

Given a dataset stored in a dataframe , split it into training and test (random sampling is better to get an indepth understanding of how good your classifier is for all cases , stratified sampling can sometimes mask your true accuracy) Why? Let's take an example :

If you are doing stratified sampling on some particular category (and let's assume this category has an exceptionally large amount of data[skewed] and the classifier predicts that one category well , you might be led to believe that the classifier works well, even if doesn't perform better on categories with less information. Where does stratified sampling work better? When you know that the real world data will also be skewed and you will be satisifed if the most important categories are predicted correctly. (This definitely does not mean that your classifier will work bad on categories with less info, it can work well , it's just that stratified sampling sometimes does not present a full picture)

Use the same training dataset to train all classifers and the same testing dataset to evaluate them. Also , random sampling would be better.

Music genre classification with sklearn: how to accurately evaluate different models

1 Answers1