Programmatically change the speed of an audio file in real-time

Question

Environment

Hardware: Raspberry Pi x
O.S.: Raspbian Jessie Lite
Language: Qt5 / C++

Goal

Execute an audio file (wav or better mp3) changing its speed smoothly and countinuosly. The pitch should change according to the speed (playback rate). My application updates several times per second a variable that contains the desired speed: i.e. 1.0 = normal speed. Required range is about 0.2 .. 3.0, with a resolution of 0.01.

The audio is likely music, expected format: mono, 16-bit, 11.025 Hz. No specific constraints about latency: below 500 ms is acceptable.

Some thougths

QMediaPlayer in QtMultimedia has the playbackRate property that should do exactly this. Unfortunately I have never be able to make QtMultimedia work in my systems.

It's ok to use also an external player, and send data using pipes or any IPC.

How would you achieve this?

Music or speech (or something else) ? (Solutions tend to be very different for speech versus music.) And presumably you want the pitch to be unaffected ? — Paul R, Jun 29 '17 at 07:26
MP3 is a non-issue. You need to decode the MP3 to LPCM anyway. No need for a player, the Pi works with Alsa. But you're going to have to implement the audio conversion yourself, as these are pretty extreme requirements. Also, resolution of 0.01 ? Who came up witht hat requirement? And where's the latency requirement? — MSalters, Jun 29 '17 at 07:50
The audio could be either speech or music, but most of the time music. The pitch MUST change! I hope the word "speed" has the right meaning here. The resolution is so small (1% is not so small by the way) to change the speed smoothly. Latency: below 500 ms is acceptable. — Mark, Jun 29 '17 at 10:46
Well, if the pitch shift isn't necessary, things are way easier. The "dumb" algorithm is to simply insert extra samples (interpolated) if the speed is lower than 1.0, and drop samples if higher. E.g. for speed 2.0, drop every second sample. For speed 1.01, drop every 100th sample. Quality is rather bad, but at least this allows you to validate the technical architecture. — MSalters, Jun 29 '17 at 10:56
@Mark: OK, in that case it's very simple then - you're just varying the playback rate - this can be achieved by resampling the audio stream, with or without interpolation (depending on what your quality requirements are). [I just noticed that MSalters has already suggested this, without the interpolation.] — Paul R, Jun 29 '17 at 11:07
I've done this in Java. The code is posted, and I can provide a link if it might be useful as a pseudo-code guide. Have also written a real-time virtual theremin that works nicely. The main thing is that the code assumes you have access to the individual PCM samples. Real time pitch changing is very smooth, from 1/8th to 8 times speed. If you think this might be helpful let me know and I will post more on an "Answer" space. I'm currently trying to learn more about how C++ handles sound, studying Unreal's source code. — Phil Freihofner, Jun 30 '17 at 02:00
@PhilFreihofner Yes, please post some more information as an answer. I don't think I can use it directly in Java, but perhaps I could try to implement your solution in C++ and then come back here to link the source code. — Mark, Jun 30 '17 at 18:37
I can recommend PortAudio as C++ audio library. It is well documented and even has [information about how to use it on RaspberryPi](https://app.assembla.com/wiki/show/portaudio/Platforms_RaspberryPi). — Spunc, Jul 03 '17 at 21:51

score 2 · Answer 1 · answered Jul 01 '17 at 07:02

I don't know how much of this translates to C++. The work I did on this problem uses Java. Still, something of the algorithm should be of help.

Example data (made up):

sample    value
0          0.0
1          0.3
2          0.5
3          0.6
4          0.2
5         -0.1
6         -0.4

With normal speed, we send the output line a series of values where the sample number increments by 1 per output frame.

If we were going slower, say half speed, we should output twice as many values before reaching the same point in the media data. In other words, we need to include, in our output, values that are at the non-existent, intermediate sample frame locations 0.5, 1.5, 2.5, ...

To do this, it turns out that linear interpolation works quite well for audio. It is possible to use a more sophisticated curve fitting algorithm but the increase in fidelity is not considered to be worth the trouble.

So, we end up with a stream as follows (for half speed):

sample    value
0          0.0
0.5        0.15
1          0.3
1.5        0.4
2          0.5
2.5        0.55
3          0.6
etc.

If you want to play back 3/4 speed, then the positions and values used in the output would be this:

sample    value
0          0.0
0.75       0.225
1.5        0.4
2.25       0.525
3          0.6
3.75       0.525
etc.

I code this via a "cursor" that is incremented each sample frame, where the increment amount determines the "speed" of the playback. The cursor points into an array, like an integer index would, but instead, is a float (or double). If there is a fractional part to the cursor's value, the fraction is used to interpolate between sample values pointed to by the integer part and the integer part plus one.

For example, if the cursor was 6.25, and the value of soundData[6] was A and the value of soundData[6+1] was B, the sound value would be:

audioValue = A * 0.75 + B * 0.25

The degree of precision with which you can define your speed increment is quite high. I think Java's floats are considered sufficient for this purpose.

As for keeping a dynamically changing speed increment smooth, I am spreading out the changes to new speeds over a series of 4096 steps (roughly 1/10th of a second, at 44100 fps). Change requests are often asynchronous, e.g., from a GUI, and are spread out over time in a somewhat unpredictable way. The smoothing algorithm should be able to recalculate and update itself with each new speed request.

Following is a link that demonstrates both strategies, where a sound's playback speed is altered in real time via a slider control.

SlidersTest.jar

This is a runnable copy of the jar file that also contains the source code, and executes via Java 8. You can also rename the file SlidersTest.zip and then drill in to view the source code, in context.

But links to the source files can also be navigated to directly in the two following sections of a page I posted for this code I recently wrote and made open source: see AudioCue.java see SlidersTest.java

AudioCue.java is a long file. The relevant parts are in the inner class at the end of the file: class AudioCuePlayer, and for the smoothing algorithm, check the setter method setSpeed which is about 3/4's of the way down. Sorry I don't have line numbers.

Thanks! The interpolation algorithm is easliy translable to C++ - now I have to figure out how to send audio samples. — Mark, Jul 01 '17 at 11:01
That's were I get a bit mystified, too. Am looking forward to seeing what you figure out. My first look is that audio isn't implemented, per se. Instead, we have to use C++ low-level lines and such to interface with one of many different audio systems that depend on what libraries have been installed. I could easily be wrong though, being a newbie to C++. — Phil Freihofner, Jul 02 '17 at 01:31

Programmatically change the speed of an audio file in real-time

Environment

Goal

Some thougths

1 Answers1

Linked