1

I would like to use Amazon's Alexa Voice Service (AVS) but when I use the Recognize Speech Request all I get back is "{message: null}". Authentication works, request seems to be correct because I've received error messages from the API when authentication failed or request was malformed.

I have a wav audio file, 16000 Hz, 1 channel, and use PHP's base64_encode() to encode the wav file's content.

For the audio file the AVS documentation only says Type: Binary Data. Represents the data for the audio.

Here's the request I'm sending:

Headers

POST /v1/avs/speechrecognizer/recognize HTTP/1.1
Host: access-alexa-na.amazon.com
Content-Type: multipart/form-data; boundary=86371ffc080fbb6fc614e8e36d0b8a4d
Authorization: Bearer Atza|IQEBL... (valid token)
Transfer-Encoding: chunked
Cache-Control: no-cache

Body

--86371ffc080fbb6fc614e8e36d0b8a4d
Content-Disposition: form-data; name="request"
Content-Type: application/json; charset=UTF-8

{
    "messageHeader": {
        "deviceContext": [
            {
                "name":"playbackState",
                "namespace":"AudioPlayer",
                "payload": {
                    "streamId": "xxxxxxxxxxxx",
                    "offsetInMilliseconds": "xxxxxxxxxxxx",
                    "playerActivity": "xxxxxxxxxxxx"
                }
            }
        ]
    },
    "messageBody": {
        "profile": "alexa-close-talk",
        "locale": "en-us",
        "format": "audio/L16; rate=16000; channels=1"
    }
}

--86371ffc080fbb6fc614e8e36d0b8a4d
Content-Disposition: form-data; name="audio"
Content-Type: audio/L16; rate=16000; channels=1

SUQzAgAAAAAQS1RUMgAAFwBhb...(truncated result of base64_encode(file.wav))
--86371ffc080fbb6fc614e8e36d0b8a4d--

Any idea what's wrong/missing?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
daniel
  • 63
  • 1
  • 7

2 Answers2

0

The encoding for the audio file should be Linear PCM not base 64. Hope this helps

user2636368
  • 622
  • 4
  • 10
  • 20
  • 1
    Excuse this rather stupid question, but would I just copy the binary file contents in the post body (like `5249 4646 b80a 0200 5741 5645 666d 7420...`)? – daniel Sep 21 '15 at 06:34
0

The POST body should contain the raw audio binary data. You can use a tool like sox to convert the audio to the format AVS is expecting it in which is mono channel, 16k Hz sample rate, signed 16 bit PCM.

Miguel Mota
  • 20,135
  • 5
  • 45
  • 64