1

I use python Websockets implemented using the websocket-client library in order to perform live speech recognition using Watson ASR. This solution was working until very recently but about a month ago it stopped working. There is not even a handshake. Weirdly enough I haven't changed the code (below). Another colleague using a different account has the same problem, so we don't believe that there is anything wrong with our accounts. I've contact IBM regarding this, but since there is no handshake there is no way they can track if something is wrong on their side. The code for websocket is shown below.

import websocket
(...)
ws = websocket.WebSocketApp(
   self.api_url,
   header=headers,
   on_message=self.on_message,
   on_error=self.on_error,
   on_close=self.on_close,
   on_open=self.on_open
)

Where the url is 'wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize', headers are the authorization tokens, and the other functions and methods to handle callbacks. What happens at the moment is that this method runs and waits until there is a time out for the connection. I was wondering if this problem is happening to anyone else running live ASR with Watson in Python running this websocket-client library.

ZeDavid
  • 13
  • 2
  • Can I know what you are passing in the headers and how? – Vidyasagar Machupalli Dec 04 '18 at 06:05
  • Yes, `headers = {'X-Watson-Authorization-Token': self.token}`, where the token is obtained like this: `authorization = AuthorizationV1(username=credentials['username'], password=credentials['password']) self.token = authorization.get_token(url=api_base_url)` – ZeDavid Dec 04 '18 at 10:04

2 Answers2

2

@zedavid Over a month ago we switch to use IAM so username and password was replaced with an IAM apikey. You should migrate your Cloud Foundry Speech to Text instance to IAM. There is a Migration page that will help you understand more about this. You can also create a new Speech to Text instance which will be a resource controlled instance by default.

Once you have the new instance you will need to get an access_token which is similar to the token in Cloud Foundry. The access_token will be used to authorize your request.

Finally, We recently released support for Speech to Text and Text to Speech in the Python SDK. I encourage you to use that rather than writing the code for the token exchange and WebSocket connection management.

service = SpeechToTextV1(
    iam_apikey='YOUR APIKEY',
    url='https://stream.watsonplatform.net/speech-to-text/api')

# Example using websockets
class MyRecognizeCallback(RecognizeCallback):
    def __init__(self):
        RecognizeCallback.__init__(self)

    def on_transcription(self, transcript):
        print(transcript)

    def on_connected(self):
        print('Connection was successful')

    def on_error(self, error):
        print('Error received: {}'.format(error))

    def on_inactivity_timeout(self, error):
        print('Inactivity timeout: {}'.format(error))

    def on_listening(self):
        print('Service is listening')

    def on_hypothesis(self, hypothesis):
        print(hypothesis)

    def on_data(self, data):
        print(data)

# Example using threads in a non-blocking way
mycallback = MyRecognizeCallback()
audio_file = open(join(dirname(__file__), '../resources/speech.wav'), 'rb')
audio_source = AudioSource(audio_file)
recognize_thread = threading.Thread(
    target=service.recognize_using_websocket,
    args=(audio_source, "audio/l16; rate=44100", mycallback))
recognize_thread.start()
German Attanasio
  • 22,217
  • 7
  • 47
  • 63
  • Thanks German! Does the piece of code that you provided allows live speech recognition? I have the impression that this is recognising from a file. – ZeDavid Dec 06 '18 at 16:12
  • It's recognizing from a file but there are other examples in the SDK where you can see how to listen to the microphone using PyAudio – German Attanasio Dec 06 '18 at 17:19
  • 1
    As @GermanAttanasio rightly mentioned, There is this example which does Live speech recognition - https://github.com/watson-developer-cloud/python-sdk/blob/master/examples/microphone-speech-to-text.py. I just verified and it works providing interim results. – Vidyasagar Machupalli Dec 07 '18 at 07:07
  • Thanks for this piece of code and the link to the SDK github. It is now working! – ZeDavid Dec 07 '18 at 16:46
0

Thanks for the headers information. Here's how it worked for me.

I am using WebSocket-client 0.54.0, which is currently the latest version. I generated a token using

curl -u <USERNAME>:<PASSWORD>  "https://stream.watsonplatform.net/authorization/api/v1/token?url=https://stream.watsonplatform.net/speech-to-text/api"

Using the returned token in the below code, I was able to make the handshake

import websocket

try:
    import thread
except ImportError:
    import _thread as thread
import time
import json


def on_message(ws, message):
    print(message)


def on_error(ws, error):
    print(error)


def on_close(ws):
    print("### closed ###")

def on_open(ws):
    def run(*args):
        for i in range(3):
            time.sleep(1)
            ws.send("Hello %d" % i)
        time.sleep(1)
        ws.close()
        print("thread terminating...")

    thread.start_new_thread(run, ())


if __name__ == "__main__":
    # headers["Authorization"] = "Basic " + base64.b64encode(auth.encode()).decode('utf-8')
    websocket.enableTrace(True)
    ws = websocket.WebSocketApp("wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize",
                                on_message=on_message,
                                on_error=on_error,
                                on_close=on_close,
                                header={
                                    "X-Watson-Authorization-Token": <TOKEN>"})
    ws.on_open = on_open
    ws.run_forever()

Response:

--- request header ---
GET /speech-to-text/api/v1/recognize HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Host: stream.watsonplatform.net
Origin: http://stream.watsonplatform.net
Sec-WebSocket-Key: Yuack3TM04/MPePJzvH8bA==
Sec-WebSocket-Version: 13
X-Watson-Authorization-Token: <TOKEN>


-----------------------
--- response header ---
HTTP/1.1 101 Switching Protocols
Date: Tue, 04 Dec 2018 12:13:57 GMT
Content-Type: application/octet-stream
Connection: upgrade
Upgrade: websocket
Sec-Websocket-Accept: 4te/E4t9+T8pBtxabmxrvPZfPfI=
x-global-transaction-id: a83c91fd1d100ff0cb2a6f50a7690694
X-DP-Watson-Tran-ID: a83c91fd1d100ff0cb2a6f50a7690694
-----------------------
send: b'\x81\x87\x9fd\xd9\xae\xd7\x01\xb5\xc2\xf0D\xe9'
Connection is already closed.
### closed ###

Process finished with exit code 0

According to RFC 6455, the server should respond with 101 Switching protocol,

The handshake from the server looks as follows:

    HTTP/1.1 101 Switching Protocols
    Upgrade: websocket
    Connection: Upgrade
    Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
    Sec-WebSocket-Protocol: chat

Additionally, when I am using ws:// instead of wss://, I am facing the operation timeout issue.

Update: Example with Live Speech Recognition - https://github.com/watson-developer-cloud/python-sdk/blob/master/examples/microphone-speech-to-text.py

Community
  • 1
  • 1
Vidyasagar Machupalli
  • 2,737
  • 1
  • 19
  • 29