Check out Amazon's AudioPlayer Interface Reference. It gives a pretty comprehensive guide on how to make the audio interface work. Essentially, it boils down to adding another directive to the list of directives you're returning in your response JSON. For me, this will automatically come up with the audio player screen.
A basic version of the audio directive looks like the following:
{
"type": "AudioPlayer.Play",
"playBehavior": "ENQUEUE",
"audioItem": {
"stream": {
"token": "Audio Playback",
"url": "http://www.audio.com/this/is/the/url/to/the/audio",
"offsetInMilliseconds": 0
}
}
}
ENQUEUE
adds the specified stream to the end of the current stream queue. The offsetInMilliseconds
key sets how far into the stream (in milliseconds) playback should begin.
When you nest this into the larger response JSON, it takes on the form of following:
{
"version": "1.0",
"sessionAttributes": {},
"response": {
"outputSpeech": {},
"card": {},
"reprompt": {},
"directives": [
{
"type": "AudioPlayer.Play",
"playBehavior": "ENQUEUE",
"audioItem": {
"stream": {
"token": "Audio Playback",
"url": "http://www.audio.com/this/is/the/url/to/the/audio",
"offsetInMilliseconds": 0
}
}
}
],
"shouldEndSession": true
}
}
There are a handful of other options to include in your audio directive. These can be found in the link I mentioned above.
I find it most beneficial to make a function where you can pass in given values to create the AudioPlayer directive JSON. For example, in python, this may look like the following:
def build_audio_directive(play_behavior, token, url, offset)
return {
"type": "AudioPlayer.Play",
"playBehavior": play_behavior,
"audioItem": {
"stream": {
"token": token,
"url": url,
"offsetInMilliseconds": offset
}
}
}
There are multiple ways to build up the response, but I find this way is the easiest for me to visualize.