0

I am having trouble using the session-based speech recognition interface. Specifically, I am trying to split a longer audio stream into multiple chunks, upload them one at a time, and receive the complete parsed text at the end (as opposed to streaming the chunked audio from a single source).

IBM Watson's offers both stateless and stateful interfaces to speech recognition. The more common stateless protocol accepts a (chunked) audio stream and returns the parsed content on completion. The session-based approach allows the client to establish a persistent session, upload the audio as multiple chunks using multi-part, and query for the results, which can be very useful for processing long streams or processing microphone input.

I was able to find some tutorials and discussions but none of the examples seem to work (likely out of date, as the interface is evolving rapidly).

Here's a representative sample. The following POST will create a session:

curl -X POST -u "user:password" -H "Content-Type: application/json" \
https://stream.watsonplatform.net/speech-to-text/api/v1/sessions -verbose -d ""

Then, the next one should submit a portion of the audio data to recognize service, using the endpoints provided by the previous command:

curl -k -X POST -u "user:password" \
-H "content-type: audio/flac" --data-binary @temp.2.flac -H "Transfer-encoding: chunked" \
--cookie "SESSIONID=65097570295a0eccd15fd6dba326487416634371; Secure" \
https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874/recognize -verbose

Finally, this command should return the results:

curl -k -X GET -u "user:password" \
--cookie "SESSIONID=65097570295a0eccd15fd6dba326487416634371; Secure" \
https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874/observe_result -verbose

The first command completes without any issues, returning HTTP 201 Created status, as well as reasonably looking endpoints, which are used (together with the SESSIONID cookie) for subsequent calls.:

  "recognize": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874/recognize",
  "recognizeWS": "wss://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874/recognize",
  "observe_result": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874/observe_result",
  "session_id": "65097570295a0eccd15fd6dba3264874",
  "new_session_uri": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874"

However, both the 2nd and 3rd command fail with HTTP code of 404 "Session does not exist." error.

Any curl or Java pointers or examples would be greatly appreciated.

dimo414
  • 47,227
  • 18
  • 148
  • 244
  • Hi Robert, please do not use sessions for this, websockets are the right choice for streaming to Watson STT. Please take a look at my answer here: http://stackoverflow.com/questions/37232560/stream-audio-from-mic-to-ibm-watson-speechtotext-web-service-using-java-sdk/38231774#38231774 – Daniel Bolanos Jul 06 '16 at 18:59

2 Answers2

1

Robert,

I was just made aware of this post; sorry for the delay. I'm not sure how you're issuing the commands, but the issue may be that the session timed out before the subsequent calls. If the default 30-second session timeout expires before the subsequent calls, the service returns a 404 with the indicated message. It could also be an issue with how you're providing the cookie, as the previous user indicates. But I've experienced the session timeout issue, which could also be the culprit.

Jeff
  • 86
  • 2
0

I wrote a Gist that uses curl commands to recognize a PCM file. In your case you just need to change the audio format and point to your file.
See https://gist.github.com/germanattanasio/ae26dc0144f229ad913a

When dealing with cookies it's always good to save them in a file and then use that file in the subsequent request.

For example

curl -X POST -u "user:password" -H "Content-Type: application/json" \
https://stream.watsonplatform.net/speech-to-text/api/v1/sessions \
-verbose -d ""

could be writen as:

curl -X POST -b cookies.txt -c cookies.txt -u $USERNAME:$PASSWORD \
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions" \
-d "" 

The result will be the same and cookies.txt will have the SESSIONID.

Then you can use:

curl -X POST -b cookies.txt -c cookies.txt -u $USERNAME:$PASSWORD \
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/$SESSION_ID/recognize?continuous=true" \
--header "Content-Type: audio/flac" --header "Transfer-Encoding: chunked" \
--data-binary @temp.2.flac

Make sure $SESSION_ID is updated with the value you get in the first curl command.

German Attanasio
  • 22,217
  • 7
  • 47
  • 63