0

I'm quite new to audio processing. Lately, I'm trying to use audio data to control the virtual lightbulb (implemented by a simple python Turtle) with the assumption that each element in the Numpy array represents the amplitude of the audio at the specific time. However, the result came out to be very different from what I expected. Thus, I start to wonder whether my assumption is correct or not.

Please kindly help.
Thank everyone in advance.

load the audio file using pydub

frame_rate = 16000

#load component into variable
music = AudioSegment.from_wav("Music.wav").set_frame_rate(frame_rate)

change it to Numpy array

musicArr = np.array(music.get_array_of_samples())

printing the array out

print(musicArr)

The result is something like this

[ 11 -11  12 -20  13 -23  10 -24  13 -25  10 -19   7 -16   5  -4   8   2
  12   9  14  18  24  23  33  29  30  30  33  32  32  33  28  26  25  18
  24  15  21  10  21   2  12  -1  10  -8   1 -11   0 -10  -4  -6 -13  -1
 -19   1 -29   4 -31   6 -38   6 -41   4 -43   1 -47  -6 -48 -11 -49 -24
 -52 -27 -51 -28 -49 -33 -53 -35 -55 -33 -56 -36 -52 -37 -51 -36 -47 -33
 -45 -28 -44 -34 -44 -37 -44 -40 -43 -45 -43 -47 -42 -44 -42 -46 -40 -42
 -37 -31 -27 -28 -26 -23 -21 -15 -11 -17  -6 -14  -5 -12  -1  -9  -9  -6
  -6  -5  -9  -2 -12  -4 -15  -2 -18   1 -20  -1 -19  -1 -16  -3 -16  -9
 -13 -12  -6 -16  -8 -17  -5 -19  -1 -22  -3 -20   0 -18   3 -22   7 -23
   9 -24   8 -24   6 -25   8 -23   4 -23   2 -22   3 -21   8 -23   6 -24
   3 -24   0 -22  -1 -30  -4 -32  -3 -37  -8 -32 -12 -38 -20 -30 -16 -32
 -19 -32]
SorawitC
  • 23
  • 5
  • You are correct in saying each element should be a sample value. In this case it appears to print the byte values interpreted as signed 8-bit ints, or the wav has a PCM bit-depth of 8 – fdcpp Jan 19 '22 at 17:39
  • What is the dtype of the numpy array? – fdcpp Jan 19 '22 at 17:40
  • The numpy array is and each element is of type numpy.int16 – SorawitC Jan 20 '22 at 02:52
  • Then that looks like you’ve just got very quiet audio. – fdcpp Jan 20 '22 at 07:21
  • All the high value is at almost the end of the array. I don't know why it acts like that. Normally, the source music should have some vocal start at 4 sec in the beginning. – SorawitC Jan 21 '22 at 14:23
  • I'd sanity check this against an Audio editor like Audactity before making any assumptions on what you think you should see – fdcpp Jan 21 '22 at 15:08
  • I can solve the problem now. This weird display I get is happening because the python can not run as fast I expected. But, anyway thank you very much – SorawitC Jan 22 '22 at 13:06

0 Answers0