how to pass metadata google speech to text api - swift ios

Question

anyone please help me finding official document of pod used in this example: https://github.com/GoogleCloudPlatform/ios-docs-samples/tree/master/speech/Swift/Speech-gRPC-Streaming

moreover I am working on an iOS app in which we have google speech to text with streaming approach, in the example you didnt demonstrate how can I pass metadata so maybe official document have some help regarding how to pass metadata while initializing, here is the full config I wanted to feed:

{
        "encoding": "LINEAR16",
        "sampleRateHertz": 16000,
        "languageCode": "en-US",
        "maxAlternatives": 30,
        "metadata": {
            "interactionType": "VOICE_SEARCH",
            "recordingDeviceType": "SMARTPHONE",
            "microphoneDistance": "NEARFIELD",
            "originalMediaType": "AUDIO",
            "recordingDeviceName": "iPhone",
            "audioTopic": "Quran surah and ayah search"
        },

        "speechContexts": [
            {
                "phrases": ["mumtahinah"],
                "boost": 2
            },
            {
                "phrases": ["Hujrat"],
                "boost": 2
            },
            {
                "phrases": ["taubah"],
                "boost": 2
            },
            {
                "phrases": ["fajar"],
                "boost": 2
            }
        ]
    }

here is my current code:

import Foundation
import googleapis

let API_KEY : String = "YOUR_API_KEY"
let HOST = "speech.googleapis.com"

typealias SpeechRecognitionCompletionHandler = (StreamingRecognizeResponse?, NSError?) -> (Void)

class SpeechRecognitionService {
  var sampleRate: Int = 16000
  private var streaming = false

  private var client : Speech!
  private var writer : GRXBufferedPipe!
  private var call : GRPCProtoCall!

  static let sharedInstance = SpeechRecognitionService()

  func streamAudioData(_ audioData: NSData, completion: @escaping SpeechRecognitionCompletionHandler) {
    if (!streaming) {
      // if we aren't already streaming, set up a gRPC connection
      client = Speech(host:HOST)
      writer = GRXBufferedPipe()
      call = client.rpcToStreamingRecognize(withRequestsWriter: writer,
                                            eventHandler:
        { (done, response, error) in
                                              completion(response, error as? NSError)
      })
      // authenticate using an API key obtained from the Google Cloud Console
      call.requestHeaders.setObject(NSString(string:API_KEY),
                                    forKey:NSString(string:"X-Goog-Api-Key"))
      // if the API key has a bundle ID restriction, specify the bundle ID like this
      call.requestHeaders.setObject(NSString(string:Bundle.main.bundleIdentifier!),
                                    forKey:NSString(string:"X-Ios-Bundle-Identifier"))

      print("HEADERS:\(call.requestHeaders)")

      call.start()
      streaming = true

      // send an initial request message to configure the service
      let recognitionConfig = RecognitionConfig()
      recognitionConfig.encoding =  .linear16
      recognitionConfig.sampleRateHertz = Int32(sampleRate)
      recognitionConfig.languageCode = "en-US"
      recognitionConfig.maxAlternatives = 30
      recognitionConfig.enableWordTimeOffsets = true

      let streamingRecognitionConfig = StreamingRecognitionConfig()
      streamingRecognitionConfig.config = recognitionConfig
      streamingRecognitionConfig.singleUtterance = false
      streamingRecognitionConfig.interimResults = true

      let streamingRecognizeRequest = StreamingRecognizeRequest()
      streamingRecognizeRequest.streamingConfig = streamingRecognitionConfig

      writer.writeValue(streamingRecognizeRequest)
    }

    // send a request message containing the audio data
    let streamingRecognizeRequest = StreamingRecognizeRequest()
    streamingRecognizeRequest.audioContent = audioData as Data
    writer.writeValue(streamingRecognizeRequest)
  }

  func stopStreaming() {
    if (!streaming) {
      return
    }
    writer.finishWithError(nil)
    streaming = false
  }

  func isStreaming() -> Bool {
    return streaming
  }

}

BlackPearl12 · Answer 1 · 2020-08-15T18:59:09.890

The google speech to text is not straightforward. It has a slight tweak to be performed while using cocoapods as we need to add google dependencies which you can down from the githubLink . Also, for the complete tutorial go through this article where:

part-1 shows integration: STT-iOS integration part 1 and
part 2 explains the training of the google cloud to recognise words better using the metadata : STT cloud training.

For your problem, take all the metadata and add it to a dictionary. Let's call it mySpeechContext.

You need to pass this context to the speechContextArray property of the RecognitonConfig.

In your code just add this line:

 recognitionConfig.speechContextsArray = NSMutableArray(array: [mySpeechContext])

this will send your entire speech context tot he google speech cloud and using your key, whenever you use the speech to text service, it will look into the data to train and recognise better with more boost and confidence to this metadata/words.

how to pass metadata google speech to text api - swift ios

1 Answers1