Extract 3D array of melspctogram with librosa

Question

When I extract one file of audio I get an array of Spectogram Array Shape: (128, 87). When I try to extract from a list of files 80 audio files I get (80 , 128) but i want an array like these (80, 128, 87). Where I am going wrong in my Code

def get_features(y, sr=fs):
    melspectrogram = librosa.feature.melspectrogram(y, sr=fs)
    feature_vector = np.mean(melspectrogram,1)
    return feature_vector

Creating a feature vector

# Load audio files, calculate features and create feature vectors
feature_vectors = []
sound_paths = []
for i,f in enumerate(files):
    print ("get %d of %d = %s"%(i+1, len(files), f))
    try:
        y, sr = librosa.load(f, sr=fs)
        y/=y.max() #Normalize
        if len(y) < 2:
            print("Error loading %s" % f)
            continue
        feat = get_features(y, sr)
        feature_vectors.append(feat)
        sound_paths.append(f)
    except Exception as e:
        print("Error loading %s. Error: %s" % (f,e))
        
print("Calculated %d feature vectors"%len(feature_vectors))

Standardization

# Scale features using Standard Scaler
scaler = StandardScaler()
scaled_feature_vectors = scaler.fit_transform(np.array(feature_vectors))
print("Feature vectors shape:",scaled_feature_vectors.shape)

Jon Nordby · Answer 1 · 2020-07-22T21:01:23.160

0

Your get_features function uses numpy.mean(). So that will summarize any temporal variations of the audio clip, and leave only the frequency dimension. Remove that and you can have a 3D vector of (samples, frequencybins, timeframes).

If your input audio has different length you may need to take some extra steps to ensure that your spectrograms are all the same length. One approach is to use (shorter) fixed length analysis windows. The other is to pad or crop the spectrogram.

edited Jul 22 '20 at 21:01

answered Jul 20 '20 at 21:01

Jon Nordby

5,494
1
21
50

when I use **feature_vector = melspectrogram, 1** I get these error **ValueError: setting an array element with a sequence.** When I use **feature_vector = melspectrogram** I get these error **ValueError: could not broadcast input array from shape (128,381) into shape (128)** Kindly help me out – Muhidin Mubarak Jul 21 '20 at 13:54
It is these line **scaled_feature_vectors = scaler.fit_transform(np.array(feature_vectors))** and am using **feature_vector = melspectrogram**. When i put feat.shape the array are varying in shape the first one is (128, 381) , the second one is (128, 394) so am wondering if we can have a 3D array say (80 , 128, 381) can we truncate like 128, 394) to be (128, 381) so that we can have a unifrom array and stack it. Thanks in advance – Muhidin Mubarak Jul 22 '20 at 17:41
Yes, need to take care of the length along time dimension. I have updated my answer with some references for that – Jon Nordby Jul 22 '20 at 21:02
I was able to create the 3D array and infact used 600 audio and had (600,128,100) and did not use scale.fit_tranform but it gave me poor result of 60% accuracy which drooped to 20% .These was when I used CNN. When I used (600,128) to fully connected layers it would give almost 100. Did removing np.mean in get_features affect the result or what could be the problem.Thanks Jonnor – Muhidin Mubarak Jul 25 '20 at 17:48
The performance you are reporting, is it on a validation/test-set? What is the audio that you are classifying? Only on very simple tasks would one expect mean-summarized data to give near 100% – Jon Nordby Jul 25 '20 at 21:45
Validation. Am classifying musical instruments with 6 classes – Muhidin Mubarak Jul 26 '20 at 18:53
Ok. In that case your results may make sense: By doing a mean across time you basically get a measure of the frequency content of the sound, which might be a good predictor of musical instrument. When you have the full spectrogram across time, the model needs to deciper the temporal patterns, and figure out what kind of patterns are indicates a type of instrument versus what is just about the notes being played etc – Jon Nordby Jul 26 '20 at 23:05
Any way to improve full spectogram method accuracy? – Muhidin Mubarak Jul 27 '20 at 16:47
Why is melspectogram gving these errors**ComplexWarning: Casting complex values to real discards the imaginary part During handling of the above exception, another exception occurred:** – Muhidin Mubarak Jul 27 '20 at 17:20

Extract 3D array of melspctogram with librosa

1 Answers1