4

I'm attempting to use pyaudio to make a voice masker. With the way I have it set up right now, the only thing I have to do is input the sound, change the pitch on the fly, and chunk it right back out. The first and last part are working, and I think I'm getting close to changing pitch... emphasis on the "think".

Unfortunately, I'm not too familiar with the type of data I'm working with and how exactly to manipulate it the way I want. I've gone through the audioop documentation and havn't found what I needed (thought there are some things I could definately use in there). I guess what I'm asking is...

How is the data formatted in these audio frames.

How can I change the pitch of a frame (if I can), or is it even close to working like that?

import pyaudio
import sys
import numpy as np
import wave
import audioop
import struct

chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 41000
RECORD_SECONDS = 5

p = pyaudio.PyAudio()

stream = p.open(format = FORMAT,
                channels = CHANNELS,
                rate = RATE,
                input = True,
                output = True,
                frames_per_buffer = chunk)

swidth = 2

print "* recording"



while(True):

    data = stream.read(chunk)
    data = np.array(wave.struct.unpack("%dh"%(len(data)/swidth), data))*2

    data = np.fft.rfft(data)
    #MANipulation
    data = np.fft.irfft(data)



    stream.write(data3, chunk)




print "* done"

stream.stop_stream()
stream.close()
p.terminate()
Charles Sprayberry
  • 7,741
  • 3
  • 41
  • 50
Lebull on Wow
  • 43
  • 1
  • 1
  • 4

2 Answers2

5

After the irfft line, and before the stream.write line, you need to convert the data back into 16-bit integers with a struct.pack call.

data = np.fft.irfft(data)
dataout = np.array(data*0.5, dtype='int16') #undo the *2 that was done at reading
chunkout = struct.pack("%dh"%(len(dataout)), *list(dataout)) #convert back to 16-bit data
stream.write(chunkout)
mtrw
  • 34,200
  • 7
  • 63
  • 71
  • Got it... it's perrrrfect. Thank you guys so much. – Lebull on Wow Jun 14 '11 at 00:50
  • 1
    This is great, super-helpful, thank you! Except, I think there's a typo - shouldn't the 3rd line be "chunkout = wave.struct.pack[...]"? – scubbo Sep 21 '14 at 15:01
  • @scubbo - thanks. I think it should be `struct.pack` not `wave.struct.pack`, but yeah, you're right. – mtrw Sep 21 '14 at 15:27
  • @mtrw How do you change the amount you want the pitch shift to be? (+1). – Neil Mar 28 '18 at 20:36
  • @Neil - this answer only deals with the data format part of the question. Pitch shift is a whole huge topic and I don't know much about it. You might want to ask on https://dsp.stackexchange.com. – mtrw Apr 10 '18 at 23:29
3

To change the pitch, you'll have to perform an FFT on a number of frames and then shift the data in frequency (move the data to different frequency bins) and perform an inverse FFT.

If you don't mind the sound fragment getting longer while lowering the pitch (or higher when increasing the pitch), you could resample the frames. For instance, you could double each frame (insert a copy of each frame in the stream) thereby lowering the playback speed and the pitch. You can then improve the audio quality by improving the resampling algorithm to use some sort of interpolation and/or filtering.

Han
  • 2,017
  • 17
  • 23
  • I tried peforming FTT and IFFT immediately afterwards... returns static. Are there still supposed to be imaginary components in the array? – Lebull on Wow Jun 13 '11 at 18:19
  • No, if you perform an FFT followed by an IFFT on a real signal, the result will be a real signal. – Han Jun 13 '11 at 19:17
  • Heh... well I guess the problem is going to be with the unpacking and not the FTT. I've updated the code to what I have now. – Lebull on Wow Jun 13 '11 at 20:03
  • Could you explain how you would go about the "move the data to different frequency pins" part? To change pitch, would you just scale all values within data array by some constant? – bkr879 Jan 17 '17 at 21:01
  • If you FFT data is in an array x[1]...x[N], then you would move all elements up y[n] = x[n-k], or down y[n] = x[n+k]. The shift in pitch would be k*f/2, where f is the sample rate. – Han Jan 18 '17 at 21:18