How to compare audio on similarity in Python?

Question

I am using Python based audio library librosa to analyze musical audio tracks on note onset events. With this information I am slicing those tracks into several smaller, very short pieces / slices - all based in the note onset events.

Having those slices I am analyzing them using the librosa built in tools for feature extraction like chromagram or MFCC. The output looks like:

librosa.feature.chroma_stft(y=y, sr=sr)
array([[ 0.974,  0.881, ...,  0.925,  1.   ],
       [ 1.   ,  0.841, ...,  0.882,  0.878],
       ...,
       [ 0.658,  0.985, ...,  0.878,  0.764],
       [ 0.969,  0.92 , ...,  0.974,  0.915]])

librosa.feature.mfcc(y=y, sr=sr)
array([[ -5.229e+02,  -4.944e+02, ...,  -5.229e+02,  -5.229e+02],
       [  7.105e-15,   3.787e+01, ...,  -7.105e-15,  -7.105e-15],
       ...,
       [  1.066e-14,  -7.500e+00, ...,   1.421e-14,   1.421e-14],
       [  3.109e-14,  -5.058e+00, ...,   2.931e-14,   2.931e-14]])

As we can see these functions put out a matrix which holds up the information about the extracted features. All those informations (features, slice start and end, filename) will be stored into a (sqlite) database. The sliced audio-data will be released.

The features describe the "type" / sound of the analyzed audio numerically and are a good base to make similarity calculations.

Having all this information (and a large database with hundreds of analyses tracks) I want to be able to pick a random slice and compare it against all the other slices in the database to find the one that's most similar to the picked one - based on the extracted feature information.

What do I need to do to compare the result of the above listed functions on similarity?

what kind of similarity do you seek? In human terms – Jon Nordby Jul 28 '19 at 12:01 — Jon Nordby, Jul 28 '19 at 12:01

score 3 · Answer 1 · answered Jul 22 '21 at 12:16

3

Librosa has a segment_cross_similiarity function you can use to do this task, you only need to decide which features you want to cross-check

answered Jul 22 '21 at 12:16

Pianistprogrammer

567
4
16

milahu · Answer 2 · 2019-05-02T18:45:21.803

ranking is the problem you describe.

you must find a "good formula"
to reduce "all the dimensions" into one dimension
--> similarity, proximity, closeness, rank.

general formula for a "weighted sum":

rank(o, x)  =  w_1*(x_1 - o_1)^e_1  +  w_2*(x_2 - o_2)^e_2  +  ...

with the origin (o_1 o_2 ...) = your needle, the one slice you pick
and the point (x_1 x_2 ...) = your haystack, all the other slices
and the weights (w_1 w_2 ...)
and the exponents (e_1 e_2 ...)

weights and exponents are a simple way to "fine tune" your formula.
if your dimensions would be orthogonal, the exponent simply would be two --> cartesian geometry.
but in "real world" data analysis, dimensions are always corelated = not orthogonal,
and you will need to guess parameters
(and group similar dimensions into more complex summands),
to get acceptable results.

another option is the sledgehammer of "machine learning",
but then you must train your own model,
and you also must find a way to rank your files.

How to compare audio on similarity in Python?

2 Answers2