PyAudio Recording and Playing Back in Real Time

Question

I am trying to record audio from the microphone and then play that audio through the speakers. Eventually I want to modify the audio before playing it back, but I'm having trouble taking the input data and successfully play it back through the speakers.

The format for the input stream I'm using is Int16 and for the output stream is Float32. These were the only ones which made any sound at all (albeit a demonic one).

First I tried simply putting the input data into the output stream. This outputs a demonic sound:

import pyaudio
import numpy as np
import struct

FORMATIN = pyaudio.paInt16
FORMATOUT = pyaudio.paFloat32
CHANNELS = 1
RATE = 44100
CHUNK = 1024


audio = pyaudio.PyAudio()

# start Recording
streamIn = audio.open(format=FORMATIN, channels=CHANNELS,
                      rate=RATE, input=True, input_device_index=0,
                      frames_per_buffer=CHUNK)
streamOut = audio.open(format=FORMATOUT, channels=CHANNELS,
                       rate=RATE, output=True, input_device_index=0,
                       frames_per_buffer=CHUNK)
print("recording...")


while True:
    in_data = streamIn.read(CHUNK)
    streamOut.write(in_data)

in_data is as follows when printed:

1\x00\x12\x00\x0f\x00\x05\x00\x14\x00\x1e\x00\x16\x00\x14\x00\x12\x00\x10\x00\x02\x00\xf7\xff\xf7\xff\xd4\xff\xde\xff\xf8\xff\xd3\xff\xe9\xff\x14\x00@\x00Z\x00\xb9\xfft\xff\xce\x00\x93\x01\xc2\xff\xe4\xfe\x93\x00d\x00\xca\xff\x94\x01V\x01\xc8\xffS\x00t\x00\xc4\xffi\x00\xaf\x01l\x00\xdb\xfeM\xffw\xffp\x01\xf5\xffr\xfc\x97\x00~\x02S\x00\x97\x00v\x00\x87\xfe\xb7\xfc\x81\xff\xf6\x00\xef\x00\xc4\x03\x84\x02\x99\xfd`\xfc\xe2\x01b\x03\xda\xfe\xc4\xff\xfd\x00:\x00\xc6\x00\xf1\xfcV\xfd\xf0\x02\xdc\xff&\xff\xa1\x02\xc7\xff\xf5\xfe\xa9\xfe\x99\xfa\x06\xfdo\x04\xaa\x02\x8f\xfe\xec\x00\x1b\xffZ\xfe;\x01t\xfe<\xffd\x02<\x02\x04\x02\xcd\xfd\xe8\xfd\xf3\x00i\xfcD\xfa\x86\xfe\xb3\x01\xea\x00$\x00q\x00\x03\x022\x00d\xf9\x14\xfa\x86\xfdQ\xfd\xc5\xfe\x81\x02\xc2\x02=\x01\xfc\x00\xe5\xfd\t\xff\x93\xff\x83\xffd\x00(\xfeQ\xffM\x01\xb1\x01\xde\xfdE\xfd\xfe\xff\x00\x00\x06\x00\x02\xffV\xff\xcd\xffJ\xff\xfb\xfc\x86\xfd^\x00\x8d\x00\x91\xff\xb6\xfe\xf7\x00\x95\x01E\x00\x1b\xff9\xfe8\xff\xa7\xff\xd4\xff\xdd\xff\xb0\x00\x97\x01\xe8\x00\xa7\xff\xd8\xfe\x89\xff\x0c\x00\x81\xff\x81\xfe\xd1\xfeN\x00\x1a\x01\xcb\x00\x19\x00\x90\x00`\x00\x93\xff5\xff\x9b\xff\\\x00\x08\x00\xc0\xff,\x00\xc0\x00\xba\x00\x83\x00\x0f\x00\xf5\xffY\x00\x19\

Then I tried changing in_data to Float32, but that did not work either:

in_data = np.frombuffer(in_data, np.float32))

I tried various clipping and packing of the data, none of which worked:

in_data = np.clip(in_data, -2**15+1, 2**15-1)
in_data = struct.pack('d' * 1024, *in_data)

Does anyone know how to record audio from the microphone and then output it through speakers? Thank you.

score 1 · Answer 1 · answered May 04 '22 at 11:19

Set FORMATOUT =FORMATIN.

Currently, your code does the following:

44100 times per second, a frame is recorded
each frame is a 16 bit signed number (16 bit LPCM). It takes 2 bytes to encode a frame. This is the FORMATIN = pyaudio.paInt16 setting you chose.
when 1024 frames have been recorded (this takes ca 23 ms), these are returned as a bytes-object in python. It consists of 2048 bytes. you call this variable in_data
then, you pass these 2048 bytes to the output device via the .write call.
the output device works in pyaudio.paFloat32, which means that it believes each frame is 32 bits (4 bytes). It concludes that you have provided 2048/4=512 frames for it to play back. the output unit is set to 44100Hz as well, so it takes ca 12 ms to play back. the values it plays back are a mess, since it tries to interpret integers as floats. both the bitrate and the encoding mismatches, and the sound in your speaker seems to be from tormented souls in the purgatory.
then the whole process repeats

matching the input and output format should resolve these issues.

score -1 · Answer 2 · answered Nov 28 '20 at 10:33

-1

Audio data with 16-bit signed integer format will have values between 32768 and -32767. Data with float (32 bit or 64 bit) will be in range 1.0 to -1.0.

I would recommend doing all processing in floating point in Python. So try to do in_data = (in_data / 32768), before processing or sending to the output.

answered Nov 28 '20 at 10:33

Jon Nordby

5,494
1
21
50

I thought that's what np.frombuffer(in_data, np.float32) was doing? Either way, in_data is in a byte format, so that operation didn't work. Thank you for the comment btw – Andrew Pulver Nov 29 '20 at 02:01
1

In bytes format? What is the format of that data? 16 bit PCM? I didn't find much in the pyaudio docs... – Jon Nordby Nov 30 '20 at 08:05
I'm not sure what it's called. I put a sample of what in_data looks like when printed in the body of my post. Do you know what this is? And yes, the pyaudio docs are pretty terrible when it comes to specifics like this. – Andrew Pulver Dec 07 '20 at 02:12

score -2 · Answer 3 · answered Nov 27 '20 at 03:35

-2

if you useing linux you can put os.system("pactl load-module module-loopback latency_msec=1")

at the beginning of the script and os.system("/usr/bin/pulseaudio --kill") at the end

pls tell me if it work now

answered Nov 27 '20 at 03:35

eyal

107
1
7

I am not using linux. Anything for mac? – Andrew Pulver Nov 27 '20 at 05:43
you can try to open software that play what you record while python script record – eyal Nov 27 '20 at 17:45
This is what I am trying to do. Use python to listen to the microphone and play what it hears through the speakers. Perhaps pyaudio isn't a good library for this? – Andrew Pulver Nov 27 '20 at 23:21

PyAudio Recording and Playing Back in Real Time

3 Answers3