How to convert .wav file into an image for neural network?

Question

I am trying to perform sound classification using neural network and would like to convert the audio file of 4 seconds in .wav file format to be converted to an image.

I would prefer to use Librosa library. Also I would like to know how to read this image and provide it as an input to any CNN model.

I did find similar post here but they don't solve my issue.

This is what I have tried so far:

y, sr = librosa.load('36902-3-2-0.wav')
S = librosa.feature.melspectrogram(y, sr, n_mels=128, fmax=8000)
librosa.display.specshow(librosa.power_to_db(S, ref=np.max), fmax=8000)
plt.savefig('mel.png')

I get this image:

And when I try to read the image using matplotlib.pyplot or cv2, all I get is an array filled with 255 value:

array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       ...,

Link to audio file: https://drive.google.com/file/d/1BBgOxKy2-JMOHa90DCeFVLgoA7pEblVg/view?usp=sharing

If you don't want your question downvoted, follow this: https://stackoverflow.com/help/how-to-ask — zabop, Aug 06 '20 at 08:22
This is helpful for that: https://stackoverflow.com/help/minimal-reproducible-example — zabop, Aug 06 '20 at 08:23
I am trying to figure out how to add the audio file to my question. — Deep, Aug 06 '20 at 08:24
For example if you are asking about a wav file, provide a wav file, or it is not reproducible. — zabop, Aug 06 '20 at 08:24
Also, read: https://meta.stackoverflow.com/questions/331598/dodging-downvotes-by-deletion-and-repost — zabop, Aug 06 '20 at 08:32
Added a link to audio file. I hope now you anyone can reproduce the same result — Deep, Aug 06 '20 at 08:33

Mark Setchell · Accepted Answer · 2020-08-06T09:36:13.840

1

That's perfectly normal - you are looking at the white border around the sides and (255,255,255) is white.

Try looking around coordinates 200,200:

print(array[200:210, 200:210])

array([[[ 96,  87, 235],
        [ 96,  87, 235],
        [ 96,  87, 235],
        [ 95,  90, 237],
        [ 95,  90, 237],
        ...
        ...

Or look at the mean:

print(array.mean())

161.20984439300412

edited Aug 06 '20 at 09:36

answered Aug 06 '20 at 08:59

Mark Setchell

191,897
31
273
432

How to convert .wav file into an image for neural network?

1 Answers1