Reduce MFCC output

Question

I am trying to analyze song audio using a python library, the output is a numpy array, the array is very large in size as the MFCC is calculated for every frame of the audio. When I write this output to a file , each song has an output of about 3-4MB. Is there a way to reduce the N frames of information into a single row of features?

click here]([![MFCC outut )

score 0 · Answer 1 · answered Dec 02 '18 at 02:55

A common practice is to group consecutive frames into sequence windows, compute aggregate statistics on each texture window and then summarize this again using aggregated statistics.

The statistics are computed per input feature (MFCC band in your case). Example statistics functions would be mean, standard deviation, min, max. Texture sizes can be between 1-60 seconds.

See Low-level features and timbre, Juan Pablo Bello, MPATE-GE 2623 Music Information Retrieval

Reduce MFCC output

1 Answers1