Google Speech Algorithm. Testing beta features

Question

On the Google Speech algorithm's page (https://cloud.google.com/speech-to-text) there is a "Demo" section where it's possible to upload a file and check the results. Using beta features I was able to get better results, but I'm not able to get similar results for the same video file using @google-cloud/speech library.

These are the configurations it shows in the Demo section:

{
  "audio": {
    "content": "/* Your audio */"
  },
  "config": {
    "audioChannelCount": 2,
    "enableAutomaticPunctuation": true,
    "enableSeparateRecognitionPerChannel": true,
    "encoding": "LINEAR16",
    "languageCode": "en-US",
    "model": "default"
  }
}

The best results it shows under the Video model tab, so I assume the default model should be replaced with video, though it also doesn't help.

This is the code of the test function:

const speech = require('@google-cloud/speech').v1p1beta1;

const client = new speech.SpeechClient({});
const file = fs.readFileSync('filePath');
const audioBytes = file.toString('base64');

const config = {
    languageCode: 'en-US',
    sampleRateHertz:  16000,
    encoding:'LINEAR16',
    audioChannelCount: 2,
    enableAutomaticPunctuation: true,
    enableSeparateRecognitionPerChannel: true,
    model: 'video',
};

const request = { audio: { content: audioBytes }, config };
const [operation] = await client.longRunningRecognize(request);
const data = await operation.promise();
return data;

Does anyone know what could make the difference? Thanks

score 0 · Answer 1 · answered Jan 21 '21 at 14:15

Yes, using, the v1p1beta, set the model to 'video' instead of 'default'.

The last time I used this, a similar config gave me the best results, but I added a few things you don't have here in your config:

"useEnhanced: true"
"speechContexts: {}" // I gave the recognizer some expected phrases
"metadata"

You can find the documentation for all of these in this link.

But I believe that useEnhanced: true is what will make the difference in your case. It seems to be what makes the recognizer use cutting-edge models.

Good luck !

Google Speech Algorithm. Testing beta features

1 Answers1