I would like to prepare an Audio-dataset for a machine learning model.
Each .wav file should be represented as an MFCC image.
While all of the images will have the same MFCC amount (= 20), the lengths of the .wav files are between 3-5 seconds.
Should I manipulate all the .wav files to have the same length? Should I normalize the MFCC values (between 0 and 1) prior to plotting?
Are there any important steps to do with such data before passing it to a machine learning model?
Further reading links would also be appreciated.