Using a microphone as an input for real-time audio. How do I extract the currently said phoneme from the audio? I need it for lipsyncing 2d characters.
Basically, my approach would be to:
- Fetch the real-time audio using a microphone
- Detect the current phoneme that is being pronounced from the audio.
I have tried looking everywhere for an example or library that could solve this type of problem. Most libraries don't seem to output phonemes from audio.
There is a website that explains how they used machine learning to solve this, however without any code or tutorial on how to do it. https://www.arxiv-vanity.com/papers/1910.08685/
There is also this cool speech recognition tool called Pocketsphinx, but I cannot seem to find an example of it using Phoneme Recognition yet.