I am using Python based audio library librosa to analyze musical audio tracks on note onset events. With this information I am slicing those tracks into several smaller, very short pieces / slices - all based in the note onset events.
Having those slices I am analyzing them using the librosa built in tools for feature extraction like chromagram or MFCC. The output looks like:
librosa.feature.chroma_stft(y=y, sr=sr)
array([[ 0.974, 0.881, ..., 0.925, 1. ],
[ 1. , 0.841, ..., 0.882, 0.878],
...,
[ 0.658, 0.985, ..., 0.878, 0.764],
[ 0.969, 0.92 , ..., 0.974, 0.915]])
librosa.feature.mfcc(y=y, sr=sr)
array([[ -5.229e+02, -4.944e+02, ..., -5.229e+02, -5.229e+02],
[ 7.105e-15, 3.787e+01, ..., -7.105e-15, -7.105e-15],
...,
[ 1.066e-14, -7.500e+00, ..., 1.421e-14, 1.421e-14],
[ 3.109e-14, -5.058e+00, ..., 2.931e-14, 2.931e-14]])
As we can see these functions put out a matrix which holds up the information about the extracted features. All those informations (features, slice start and end, filename) will be stored into a (sqlite) database. The sliced audio-data will be released.
The features describe the "type" / sound of the analyzed audio numerically and are a good base to make similarity calculations.
Having all this information (and a large database with hundreds of analyses tracks) I want to be able to pick a random slice and compare it against all the other slices in the database to find the one that's most similar to the picked one - based on the extracted feature information.
What do I need to do to compare the result of the above listed functions on similarity?