On the Google Speech algorithm's page (https://cloud.google.com/speech-to-text) there is a "Demo" section where it's possible to upload a file and check the results. Using beta features I was able to get better results, but I'm not able to get similar results for the same video file using @google-cloud/speech
library.
These are the configurations it shows in the Demo section:
{
"audio": {
"content": "/* Your audio */"
},
"config": {
"audioChannelCount": 2,
"enableAutomaticPunctuation": true,
"enableSeparateRecognitionPerChannel": true,
"encoding": "LINEAR16",
"languageCode": "en-US",
"model": "default"
}
}
The best results it shows under the Video
model tab, so I assume the default
model should be replaced with video
, though it also doesn't help.
This is the code of the test function:
const speech = require('@google-cloud/speech').v1p1beta1;
const client = new speech.SpeechClient({});
const file = fs.readFileSync('filePath');
const audioBytes = file.toString('base64');
const config = {
languageCode: 'en-US',
sampleRateHertz: 16000,
encoding:'LINEAR16',
audioChannelCount: 2,
enableAutomaticPunctuation: true,
enableSeparateRecognitionPerChannel: true,
model: 'video',
};
const request = { audio: { content: audioBytes }, config };
const [operation] = await client.longRunningRecognize(request);
const data = await operation.promise();
return data;
Does anyone know what could make the difference? Thanks