Mozilla Deep Speech SST suddenly can't spell

Question

I am using deep speech for speech to text. Up to 0.8.1, when I ran transcriptions like:

byte_encoding = subprocess.check_output(
"deepspeech --model deepspeech-0.8.1-models.pbmm --scorer deepspeech-0.8.1-models.scorer --audio audio/2830-3980-0043.wav", shell=True)
transcription = byte_encoding.decode("utf-8").rstrip("\n")

I would get back results that were pretty good. But since 0.8.2, where the scorer argument was removed, my results are just rife with misspellings that make me think I am now getting a character level model where I used to get a word-level model. The errors are in a direction that looks like the model isn't correctly specified somehow.

Now I when I call:

byte_encoding = subprocess.check_output(
    ['deepspeech', '--model', 'deepspeech-0.8.2-models.pbmm', '--audio', myfile])
transcription = byte_encoding.decode("utf-8").rstrip("\n")

I now see errors like

endless -> "endules"
service -> "servic"
legacy -> "legaci"
earning -> "erting"
before -> "befir"

I'm not 100% that it is related to removing the scorer from the API, but it is one thing I see changing between releases, and the documentation suggested accuracy improvements in particular.

Olaf · Answer 1 · 2020-11-04T13:05:55.573

Short: The scorer matches letter output from the audio to actual words. You shouldn't leave it out.

Long: If you leave out the scorer argument, you won't be able to detect real world sentences as it matches the output from the acoustic model to words and word combinations present in the textual language model that is part of the scorer. And bear in mind that each scorer has specific lm_alpha and lm_beta values that make the search even more accurate.

The 0.8.2 version should be able to take the scorer argument. Otherwise update to 0.9.0, which has it as well. Maybe your environment is changed in a way. I would start in a new dir and venv.

Assuming you are using Python, you could add this to your code:

ds.enableExternalScorer(args.scorer)
ds.setScorerAlphaBeta(args.lm_alpha, args.lm_beta)

And check the example script.

Mozilla Deep Speech SST suddenly can't spell

1 Answers1