My task is to make a model predicting if a given song was made by previously specified singer (let's say Elvis Presley) or not. After reading a file in flac format, I have applied MFCC and 2 dimensional ndarray was returned. My idea was to use conv layers to scale data, then LSTM to make predictions based on what is order of sounds in melody.
The problem is, lstm input is based on a sequences of model outputs (not each sound output) so it's working based on order of songs (correct me if I'm wrong). Do I have to reshape a data set or try something else?
I know that this convLSTM might not be working at all, but I really want to see results.