I am building an app that captures user audio and analyzes disfluency in a reader's speech, so it it important for me to know all forms of disfluency.
I noticed that Google's speech to text cloud API automatically removes disfluencies in speech. For example:
"so uhh, I will probably do that umm probably next week"
Gets transcribed to:
"so I will probably do that probably next week"
Is there a way to keep the uhhs and umms?