7

I am building an app that captures user audio and analyzes disfluency in a reader's speech, so it it important for me to know all forms of disfluency.

I noticed that Google's speech to text cloud API automatically removes disfluencies in speech. For example:

"so uhh, I will probably do that umm probably next week"

Gets transcribed to:

"so I will probably do that probably next week"

Is there a way to keep the uhhs and umms?

AspiringMat
  • 2,161
  • 2
  • 21
  • 33
  • Hello. Did you find any solution? – Liam Park Oct 27 '20 at 00:24
  • @LouisBelmont I reached out to Google for help but unfortunately it seemed that disfluency removal was part of their trained model.. – AspiringMat Oct 28 '20 at 06:02
  • 1
    I also did not find anything like that for Google Speech. The closest I found was for IBM Watson, which has a hesitation and disfluencies marker that appears when disabling the smart formatting option but I have not yet been able to test – Liam Park Oct 29 '20 at 12:09
  • This process here could be useful https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2836.pdf . They use Google Cloud API to get a transcript. They then use IBM Watson coupled with Gentle forced aligner to get disfluencies which are then combined with the Google transcript. – Smokesick Mar 16 '21 at 17:12

0 Answers0