I am having trouble using the session-based speech recognition interface. Specifically, I am trying to split a longer audio stream into multiple chunks, upload them one at a time, and receive the complete parsed text at the end (as opposed to streaming the chunked audio from a single source).
IBM Watson's offers both stateless and stateful interfaces to speech recognition. The more common stateless protocol accepts a (chunked) audio stream and returns the parsed content on completion. The session-based approach allows the client to establish a persistent session, upload the audio as multiple chunks using multi-part, and query for the results, which can be very useful for processing long streams or processing microphone input.
I was able to find some tutorials and discussions but none of the examples seem to work (likely out of date, as the interface is evolving rapidly).
Here's a representative sample. The following POST will create a session:
curl -X POST -u "user:password" -H "Content-Type: application/json" \
https://stream.watsonplatform.net/speech-to-text/api/v1/sessions -verbose -d ""
Then, the next one should submit a portion of the audio data to recognize service, using the endpoints provided by the previous command:
curl -k -X POST -u "user:password" \
-H "content-type: audio/flac" --data-binary @temp.2.flac -H "Transfer-encoding: chunked" \
--cookie "SESSIONID=65097570295a0eccd15fd6dba326487416634371; Secure" \
https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874/recognize -verbose
Finally, this command should return the results:
curl -k -X GET -u "user:password" \
--cookie "SESSIONID=65097570295a0eccd15fd6dba326487416634371; Secure" \
https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874/observe_result -verbose
The first command completes without any issues, returning HTTP 201 Created status, as well as reasonably looking endpoints, which are used (together with the SESSIONID cookie) for subsequent calls.:
"recognize": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874/recognize",
"recognizeWS": "wss://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874/recognize",
"observe_result": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874/observe_result",
"session_id": "65097570295a0eccd15fd6dba3264874",
"new_session_uri": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/65097570295a0eccd15fd6dba3264874"
However, both the 2nd and 3rd command fail with HTTP code of 404 "Session does not exist." error.
Any curl or Java pointers or examples would be greatly appreciated.