I am trying to perform sound classification using neural network and would like to convert the audio file of 4 seconds in .wav file format to be converted to an image.
I would prefer to use Librosa library. Also I would like to know how to read this image and provide it as an input to any CNN model.
I did find similar post here but they don't solve my issue.
This is what I have tried so far:
y, sr = librosa.load('36902-3-2-0.wav')
S = librosa.feature.melspectrogram(y, sr, n_mels=128, fmax=8000)
librosa.display.specshow(librosa.power_to_db(S, ref=np.max), fmax=8000)
plt.savefig('mel.png')
I get this image:
And when I try to read the image using matplotlib.pyplot
or cv2
, all I get is an array filled with 255
value:
array([[[255, 255, 255],
[255, 255, 255],
[255, 255, 255],
...,
[255, 255, 255],
[255, 255, 255],
[255, 255, 255]],
[[255, 255, 255],
[255, 255, 255],
[255, 255, 255],
...,
[255, 255, 255],
[255, 255, 255],
[255, 255, 255]],
[[255, 255, 255],
[255, 255, 255],
[255, 255, 255],
...,
[255, 255, 255],
[255, 255, 255],
[255, 255, 255]],
...,
Link to audio file: https://drive.google.com/file/d/1BBgOxKy2-JMOHa90DCeFVLgoA7pEblVg/view?usp=sharing