Preprocessing of audios for Locality Search Hashing(LSH) algorithm

Question

I am working on designing LSH algorithm for similarity detection in audios. I am using librosa module to extract MFCC of audios which returns multi-dimensional list (20 rows x n columns). Currently what I am doing is that I normalized each value in MFCC list be finding its distance from their means(z-score). After that I generated 20 random list of vectors and took dot products with each list of MFCC: 1st list of MFCC with 1st random vector. 2nd list of MFCC with 2nd random vector. etc. After this process, I am left with single list with 20 values. For last step, if value is positive, then I assigned it value of 1 else it will be 0.

for i in range(20): #currently working on 20 audio file
data, sr = librosa.load(file_name[i])
mfcc = librosa.feature.mfcc(y = data, sr = sr)
dot = []
for i in range(20):
    new= np.pad(mfcc[i], (0,1293 - len(mfcc[i])), 'constant',constant_values=(0, 0))
    n = (new - new.mean()) / new.std()
    product = np.dot(n, rvec[i]) #rvec is random generated vectors
    if product >= 0:
        dot.append(1)
    else:
        dot.append(0)
print(dot)

Let me know if I am doing something wrong or if you have better idea which I should be implementing. I want to make sure if my data is in workable condition before I jump to writing code for LHS algorithm. Thanks in advance.

@JonNordby well my main issue is that i dont know how i can retain such large information (20 MFCC) and 20 random vectors is used to reduced size of MFCC arrays.(Concept of Random Projection) — Muhammad Shamil Umar, Feb 15 '23 at 02:09

Preprocessing of audios for Locality Search Hashing(LSH) algorithm

0 Answers0