Watson [speech to text]: raw audio does not work through websockets but works through http interface

Question

I have a raw audio file that I would like to transcript using watson speech to text service. I am using the default example provided at the repo.

But it works if I use http interface.

// working code
package main

import (
    "os"

    "github.com/IBM/go-sdk-core/v5/core"
    "github.com/watson-developer-cloud/go-sdk/v2/speechtotextv1"
)

func main() {
    // Instantiate the Watson Speech To Text service
    authenticator := &core.IamAuthenticator{
        ApiKey: "YOUR API KEY",
    }
    service, serviceErr := speechtotextv1.
        NewSpeechToTextV1(&speechtotextv1.SpeechToTextV1Options{
            URL:           "YOUR SERVICE URL",
            Authenticator: authenticator,
        })

    // Check successful instantiation
    if serviceErr != nil {
        panic(serviceErr)
    }

    // Open file with mp3 to recognize
    audio, audioErr := os.Open("/opt/audioRaw.raw")
    if audioErr != nil {
        panic(audioErr)
    }

    // Create a new RecognizeOptions for ContentType "audio/mp3"
    recognizeOptions := service.
        NewRecognizeOptions(audio).
        SetContentType("audio/mulaw;rate=8000;channels=1").SetModel("en-US_NarrowbandModel")

    // Call the speechToText Recognize method
    recognizeResult, _, responseErr := service.Recognize(recognizeOptions)

    // Check successful call
    if responseErr != nil {
        panic(responseErr)
    }

    // Check successful casting
    if recognizeResult != nil {
        core.PrettyPrint(recognizeResult, "Recognize")
    }
}

But if i use websocket for the raw audio file, it does not work and panic with below error upon debugging.

 "error": "unable to transcode data stream application/octet-stream -> audio/l16 "

// Does not work

package main

import (
    "encoding/json"
    "fmt"
    "os"

    "github.com/IBM/go-sdk-core/v5/core"
    "github.com/watson-developer-cloud/go-sdk/v2/speechtotextv1"
)

func main() {
    // Instantiate the Watson Speech To Text service
    authenticator := &core.IamAuthenticator{
        ApiKey: "YOUR API KEY",
    }
    service, serviceErr := speechtotextv1.
        NewSpeechToTextV1(&speechtotextv1.SpeechToTextV1Options{
            URL:           "YOUR SERVICE URL",
            Authenticator: authenticator,
        })

    // Check successful instantiation
    if serviceErr != nil {
        panic(serviceErr)
    }

    // Open file with mp3 to recognize
    audio, audioErr := os.Open("/opt/audioRaw.raw")
    if audioErr != nil {
        panic(audioErr)
    }
    // callbook can have `OnOpen`, `onData`, `OnClose` and `onError` functions
    callback := myCallBack{}

    recognizeUsingWebsocketOptions := service.
        NewRecognizeUsingWebsocketOptions(audio, "audio/mulaw;rate=8000;channels=1")

    recognizeUsingWebsocketOptions.
        SetModel("en-US_NarrowbandModel").
        SetWordConfidence(true).
        SetSpeakerLabels(true).
        SetTimestamps(true)

    service.RecognizeUsingWebsocket(recognizeUsingWebsocketOptions, callback)
}

type myCallBack struct{}

func (cb myCallBack) OnOpen() {
    fmt.Println("Handshake successful")
}

func (cb myCallBack) OnClose() {
    fmt.Println("Closing connection")
}

func (cb myCallBack) OnData(resp *core.DetailedResponse) {
    var speechResults speechtotextv1.SpeechRecognitionResults
    result := resp.GetResult().([]byte)
    json.Unmarshal(result, &speechResults)
    core.PrettyPrint(speechResults, "Recognized audio: ")
}

func (cb myCallBack) OnError(err error) {
    panic(err)
}

Could someone please help me in figure out why raw audio does not work through websocket interface?

Any pointer is much appreciated.

Ryan

@Deleplace Sorry, about missing it. I have the error in my question that i get through websockets. And you can see below that too. "error": "unable to transcode data stream application/octet-stream -> audio/l16 " — ryan embgrets, Aug 06 '21 at 15:47
? https://github.com/watson-developer-cloud/python-sdk/issues/169 https://stackoverflow.com/questions/43340717/watson-speech-to-text-unable-to-transcode-data-stream-audio-wav — , Aug 07 '21 at 06:03
can you give it a shot with the mp3 files like the example at https://github.com/watson-developer-cloud/go-sdk/blob/master/v2/examples/speechtotextv1/speech_to_text_v1.go ? — , Aug 07 '21 at 06:12
the content type syntax is correct ? I waned to check but i found a broken link.. https://cloud.ibm.com/docs/speech-to-text-icp?topic=speech-to-text-icp-audio-formats. — , Aug 07 '21 at 06:14
https://cloud.ibm.com/docs/speech-to-text-icp?topic=speech-to-text-icp-audio-formats#mulaw maybe having the number of channels disturbed it when parsing the content type ? — , Aug 07 '21 at 06:24
look like that for µlaw you might try to pass in audio/basic https://cloud.ibm.com/docs/speech-to-text-icp?topic=speech-to-text-icp-audio-formats#basic — , Aug 07 '21 at 06:27
@mh-cbon thanks for the reply. The same file works through http interface but not with websockets. I have tried the provided example mp3 file from the websocket that though works. — ryan embgrets, Aug 09 '21 at 11:23
@mh-cbon i have created an issue on their github repo. So, no one seem to care about it. https://github.com/watson-developer-cloud/go-sdk/issues/109 You can also check the file from the issue. If you play that file in audacity you get the correct audio. So file format was correct. — ryan embgrets, Aug 09 '21 at 11:24
@mh-cbon i have tried audio/basic format too. But was getting the same issue — ryan embgrets, Aug 09 '21 at 11:26
is it chinese speaking inside ? Tried to play it with `vlc --demux=rawaud --rawaud-channels 1 --rawaud-samplerate 8000 ~/Téléchargements/*.raw` Quality is really low. though, sorry i cannot help any further. I would only suggest you to keep using he reg HTTP interface, somehow. — , Aug 09 '21 at 11:30
@mh-cbon no, it is converted version of test mp3 file to raw audio. It was quite strange that http interface was working with the same details. I have playing it with audacity that plays it just fine. Here is the link to that video. If you want to check. https://youtu.be/6z6IKKGH5PA Anyway, I really appreciate your time and effect in helping me on this. Thanks again. — ryan embgrets, Aug 09 '21 at 15:55
something else i can suggest is to hack the sdk comparing the request sent to the ws service with the node or python version. but yeah, this is not super useful.... — , Aug 09 '21 at 16:46
yes, i will have look at python and node version and see if i get the same issue there too. Thanks @mh-cbon for all the things you did here for helping me. — ryan embgrets, Aug 10 '21 at 10:46

Watson [speech to text]: raw audio does not work through websockets but works through http interface

0 Answers0