1

I want to make audio spectrum analyzer in Python. I used pyaudio library and I'm reading the stream from the microphone. For every read, I get 4410 values, that I convert to numbers using numpy and then draw them onto pygame screen. It looks like this: https://photos.google.com/share/AF1QipMCWVk1pR0dmrrsTlpE3gHQ9GTUV25MqwUxw4JuW8TrItkGkuU9X3ZpY2ZQ-RLHew?key=UE9Id19IU1dtSHZfUk43TjB3SWxFcVhRRTFYOWFB (the graph is upside down) The code I have for it is this:

import pyaudio, math, struct,pygame, numpy
pa = pyaudio.PyAudio()
#open audio stream
stream = pa.open(input_device_index=1,rate=44100,format=pyaudio.paInt16,channels=2,input=True)    

#read bytes from stream and convert to numbers
def get_data():
    data = stream.read(int(44100*0.05))
    s = numpy.fromstring(data, numpy.int16)
    return struct.unpack('h'*4410, data)



pygame.init()
screen = pygame.display.set_mode((4000,1000))

def redraw():
    data = get_data() 
    #draw every number as a bar onto pygame windows
    #last 4410 values are missin      
    for x in range(4000):            
        val = data[x]           
        pygame.draw.rect(screen,(0,0,0),(x,0,1,1000),0)                      
        pygame.draw.rect(screen,(255,255,255),(x,0,1,val),0)


    pygame.display.update()
    pygame.event.clear()

while 1:    
    redraw()

Is there any fancy way to merge these 4410 values into just 15, so I can have the nice & cool green & red bars in reasonable-sized window, instead of this ugly thing that needs 3 screens?

Adam Ježek
  • 464
  • 5
  • 15

1 Answers1

2

Frequency vs Time domain

As written your code draws a time-domain representation of the samples whereas a spectrum analyser is a frequency domain representation.

Time<->Frequency domain conversion can be achieved using the Discrete Fourier Transform. In practice, you will want to apply a Window function to the data prior to transform.

The output of the DFT is a series of equally-sized frequency bins, each containing a real and imaginary component. Spectrum analysers typically have bands with a equal perceptual width - that is to say, an equal number of octaves (or fractions of an octave). Thus, the each band will have twice as many frequency bins in it as the one before. 15 bands would equate to 2/3 octave per band.

Explanation of graphical output

You have rendered time-domain samples, using one pixel horizontally for each sample, and amplitude mapped directly to the Y-coordinate. As the amplitude range is -32767 < x < 32768, the vast majority of samples will be smaller or bigger than the range provided in the display which is 0 <= x < 4000 - thus most samples will be clipped to 0 or 3999.

You can correct this by scaling the samples to fit and biasing the result by 500, such that a sample value of 0 is rendered at Y-cordinate of 500.

marko
  • 9,029
  • 4
  • 30
  • 46