How to use linear predictive coding to compress voice diphone samples?

Question

I'm working on an experimental diphone speech synthesizer for my native language which lacks good speech synthesizer for blind people.

The problem is that recorded diphone library can get very huge (hundreds of megabytes, as seen in the best speech synthesizers out there).

I have seen a few high quality diphone synthesizers with small voice sample libraries. When reading various papers about speech synthesis and, specifically, about those smaller synthesizers, they say they have used LPC (linear predictive coding) to make their voice sample library much smaller in size, and also they say that LPC give them additional benefits of easier pitch control when assembling speech from voice samples.

Unfortunately I could not find any beginner level tutorials for the use of LPC to compress voice samples. All the materials I managed to find are full of university level math. I think I don't need to study all of that just to use LPC (if I can use an FFT library for generating frequency graphs in my software, I should be able to use an LPC library in similar "black box" manner, right?).

The only more or less "production ready" code I managed to find seems to be this one: https://github.com/longluo/VoiceCodec/tree/master/src

It has four LPC related folders - openlpc, lpc, lpc10 and celp. Their header files have just a few functions, but unfortunately there are no unit tests or tutorials demonstrating their usage for voice sample compression/decompression.

Could someone experienced in DSP please take a look and give some explanations for those functions? For example, the ones in this file: https://github.com/longluo/VoiceCodec/blob/master/src/openlpc/openlpc.h

I would like to see a simple demo for feeding raw PCM wave bytes (in what format?) into LPC encoder for compression and then later feeding compressed data back into the LPC decoder together with additional parameters (what are their values and effect on the decoded result?).

What is the difference between openlpc, lpc, lpc10 an celp and which one is most appropriate for my purpose?

I'm open also to other LPC compression/decompression solutions, in case if you can suggest something better and more thoroughly documented than the VoiceCodec project code above.

Notice - please, do not explain internal workings of LPC; in my case it would be like explaining fuel chemistry to a person who just wants to learn how to drive a car.

How to use linear predictive coding to compress voice diphone samples?

0 Answers0