1

I want to change the frequency of a voice recording by changing sample rate on Mac OS X.

This is a research project aimed at people who stutter. It's essential that the latency is very low – this is, for instance, why I'm not considering Fast Fourier Transforms. Instead, I want to collect samples at a rate of, say, 44kHz, then do one of two things:

1) Play the samples back twice as slowly (i.e. 22kHz). This will result in increasing asynchrony with the source. It would be useful if I can restart the sampling every 1 second or so to prevent the asynchrony from becoming too noticeable.

2) Play the samples back twice as quickly. Obviously, it's impossible to do this continuously (i.e. can't play back samples which haven't been collected yet). To get around this, I'm intending to gate the playback with a square wave. Samples will be played back twice as quickly as they were recorded during the peak of the square wave. Nothing will be heard (but samples will still be collected) during the trough of the square wave.

I've prepared a PDF which describes the project in more detail here:

https://www.dropbox.com/s/8u3tz7d9hhxd3t9/Frequency%20shift%20techniques.pdf?dl=0

A friend has helped me with some of the programming for this using PortAudio. Unfortunately, we're getting very long latencies. I think this might be because PortAudio is working at too high a level. From the code, it looks to me as if PortAudio is buffering the incoming audio stream and then making alterations which are prima facie similar to the ones I've described above, but which are in fact operations on the buffered stream.

This isn't what I want at all. It's essential that the processing unit does as little as possible. Referring to the conditions (1) and (2) above, all the computer should do is to (1) play back the samples without any manipulation but twice as slowly; or (2) store the incoming samples then play them back twice as quickly. There should be no other processing whatsoever. I think this is the only way I'll get the very low latencies I'm looking for.

I wondered if it would be better to try doing this directly in Core Audio for OS X, rather than using PortAudio? This would limit platform compatibility. But the low latency is much more important than compatibility.

Am I likely to be able to do what I want using a mid-level service, such as Audio Units? Or would I need to write directly for a low-level service such as I/O Kit? How would I go about it?

m-ga
  • 11
  • 3
  • 1
    Why does the latency matter if you're going to be off by a half second every second anyway? It would be much better if you synthesized the speech at a different pitch and did this in near real time. This is a common effect. Remember though that you're fighting physics. You can't know something's pitch until you have captured a few complete waveforms, and therefore you can't shift without delay. Also, there are plenty of $50-$100 effects boxes that do this for you without the need to write software. – Brad Apr 27 '15 at 19:46
  • I’m interested in real time response time of neurons in the brainstem. This is on the order of 1ms. Introducing a latency of 10ms or more by manipulating waveforms is no good. I could frequency shift without delay analogous to how a record player works. When you slow down the rotation speed of a record, the frequency drops. There is no delay whatever. Delta-sigma sound cards make analogue-digital conversions in 0.25ms, for a round trip of 0.5ms. What I’d like is for incoming samples to be played back more slowly. No other manipulations. Just record at one rate, and output at a lower rate. – m-ga Apr 28 '15 at 10:57
  • You will find no computer audio that can work at latencies that low. You need dedicated hardware. And I'm sorry to say but you're wrong about being able to lower the frequency with such small latency. You need a whole millisecond just for one wavelength at 1 kHz. If you drop that by an octave to 500 Hz, you need 2 milliseconds just for one wavelength. The brain does not respond to sound this quickly. You cannot perceive pitch in a single wavelength. And again, if you're slowing down the sample rate, what's the point? You build up delay very rapidly. – Brad Apr 28 '15 at 16:27
  • Thanks Brad – perhaps I didn’t explain myself very well. There’s more detail in the PDF: tinyurl.com/lhjguv8 I'm actually replicating an experiment carried out in 1987. It used exactly the process I describe: frequency shift at near zero latency by outputting samples twice as slowly as they were recorded. Sample rate was very low (12kHz) on what was then state-of-the-art hardware. Delta-sigma converters digitise analogue in 0.25ms. This offers a round trip of 0.5ms if the computer does nothing but convert analogue to digital and back again. Thus, latencies of under 1ms are achievable. – m-ga Apr 29 '15 at 08:42
  • Here’s a link to an article which describes humans detecting sound with an accuracy of 0.01ms: http://tinyurl.com/kt26kwy Although you’re quite right to say that you'd need a whole millisecond to record the information in a 1kHz wavelength (and almost certainly longer to perceive it), the brain can respond to sound far more quickly. This is only true for neurons in the brainstem, not those in the cortex. We don’t really know how brainstem neurons enable us to perceive pitch, and this is one of the things I’m trying to find out. – m-ga Apr 29 '15 at 08:51

1 Answers1

0

It looks like the best thing for you would be to use something like Max/MSP or Pure Data. This will allow you to avoid working with text-based languages and should be good for you rapidly develop what you're looking to do. I/O kit is a bit too low-level for what you're trying to do.

Since max is not a text based language, sharing the code itself is a bit tricky on sites like stack overflow. I've included a screengrab. You can copy and paste max code, but it's a bit ugly and innappropiate for this.

Max/MSP method

here's a quick description. The box that says rect~ 1 is generating a square wave at Hz. The snapshot~ box is capturing the values this spits out. The if boxes check when it's greater than zero or less than zeros (peaks and troughs). If it gets a trough, the record~ box records the signal from the microphone box and stores it in a buffer. the groove~ box is a sampler that plays back the audio in this buffer, when it recives a bang from the if box, it plays back the audio. The sig~ box is being used to control the playback rate.

Also, you may not know this but the .PDF you're trying to share is unavailable.

One other thing, if latency is important, you should learn about something called a click train. This is basically where you send a signal with a single 1 at the start and time how long it takes for that value to get through your system.

PicnicTripper
  • 295
  • 1
  • 3
  • Thanks, I'll check out Max and Pure Data. Your diagram described what I want to do. My only concern is whether that is actually what the computer will do, or if it is an abstracted description which will produce identical results through some type of simulation which isn't shown. If the latter, I'll get latencies longer than I want. – m-ga Apr 28 '15 at 16:36