How can the Amazon Echo catch errors?

Question

I am building an application for the Amazon Echo in python. When I speak a bad utterance that the Amazon Echo does not recognize, my skill quits and returns me to the home screen. I am looking to prevent this and repeat what was just uttered by the Amazon Echo.

To try to achieve this to some extent I try calling a function to say something when the session ends or bad input is detected.

def on_session_ended(session_ended_request, session):
    """
    Called when the user ends the session.
    Is not called when the skill returns should_end_session=true
    """
    print("on_session_ended requestId=" + session_ended_request['requestId'] +
          ", sessionId=" + session['sessionId'])
    return get_session_end_response()

However, I just get an error from the Echo -- this function, on_session_ended is never entered.

So how do I conduct error catching and handling on the Amazon Echo?

UPDATE 1: I reduced the number of utterances and the number of intents with custom slots to one. Now a user should only speak A, B, C, or D. If they speak anything outside of this, then the intent is still triggered but with no slot value. Thus, I can do some error checking based on whether the slot value is there or not. However, this seems like not the best way to do it. When I try to add in intents with no slots and a corresponding utterance, anything that doesn't match either of my intents defaults to this new intent. How can I resolve these issues?

UPDATE 2: Here are some relevant sections of my code.

Intent handlers:

def lambda_handler(event, context):
    print("Python START -------------------------------")
    print("event.session.application.applicationId=" +
          event['session']['application']['applicationId'])

    if event['session']['new']:
        on_session_started({'requestId': event['request']['requestId']},
                           event['session'])

    if event['request']['type'] == "LaunchRequest":
        return on_launch(event['request'], event['session'])
    elif event['request']['type'] == "IntentRequest":
        return on_intent(event['request'], event['session'])
    elif event['request']['type'] == "SessionEndedRequest":
        return on_session_ended(event['request'], event['session'])


def on_session_started(session_started_request, session):
    print("on_session_started requestId=" + session_started_request['requestId']
          + ", sessionId=" + session['sessionId'])


def on_launch(launch_request, session):
    """ Called when the user launches the skill without specifying what they want """
    print("on_launch requestId=" + launch_request['requestId'] +
          ", sessionId=" + session['sessionId'])
    # Dispatch to your skill's launch
    return create_new_user()


def on_intent(intent_request, session):
    """ Called when the user specifies an intent for this skill """

    print("on_intent requestId=" + intent_request['requestId'] +
          ", sessionId=" + session['sessionId'])

    intent = intent_request['intent']
    intent_name = intent['name']
    attributes = session["attributes"] if 'attributes' in session else None
    intent_slots = intent['slots'] if 'slots' in intent else None

    # Dispatch to skill's intent handlers

    # TODO : Authenticate users
    #   TODO : Start session in a different spot depending on where user left off

    if intent_name == "StartQuizIntent":
        return create_new_user()

    elif intent_name == "AnswerIntent":
        return get_answer_response(intent_slots, attributes)

    elif intent_name == "TestAudioIntent":
        return get_audio_response()

    elif intent_name == "AMAZON.HelpIntent":
        return get_help_response()

    elif intent_name == "AMAZON.CancelIntent":
        return get_session_end_response()

    elif intent_name == "AMAZON.StopIntent":
        return get_session_end_response()

    else:
        return get_session_end_response()


def on_session_ended(session_ended_request, session):
    """
    Called when the user ends the session.
    Is not called when the skill returns should_end_session=true
    """
    print("on_session_ended requestId=" + session_ended_request['requestId'] +
          ", sessionId=" + session['sessionId'])
    return get_session_end_response()

Then we have the functions that actually get called and the response builders. I have edited some of the code for privacy. I haven't built up all the display response text fields and have some uids hard coded so I don't have to worry about authentication yet.

# --------------- Functions that control the skill's behavior ------------------

####### GLOBAL SETTINGS ########
utility_background_image = "https://i.imgur.com/XXXX.png"


def get_welcome_response():
    """ Returns the welcome message if a user invokes the skill without specifying an intent """
    session_attributes = {}
    card_title = ""
    speech_output = ("Hello and welcome ... quiz .... blah blah ...")
    reprompt_text = "Ask me to start and we will begin the test!"
    should_end_session = False

    # visual responses
    primary_text = ''  # TODO
    secondary_text = ''  # TODO

    return build_response(session_attributes,
                          build_speechlet_response(card_title, speech_output, reprompt_text,
                                                   should_end_session,
                                                   build_display_response(utility_background_image,
                                                                          card_title, primary_text,
                                                                          secondary_text)))


def get_session_end_response():
    """ Returns the ending message if a user errs or exits the skill """
    session_attributes = {}
    card_title = ""
    speech_output = "Thank you for your time!"
    reprompt_text = ''
    should_end_session = True

    # visual responses
    primary_text = ''  # TODO
    secondary_text = ''  # TODO

    return build_response(session_attributes,
                          build_speechlet_response(card_title, speech_output, reprompt_text,
                                                   should_end_session,
                                                   build_display_response(utility_background_image,
                                                                          card_title, primary_text,
                                                                          secondary_text)))


def get_audio_response():
    """ Tests the audio capabilities of the echo """
    session_attributes = {}
    card_title = ""  # TODO : keep no 'welcome'?
    speech_output = ""
    reprompt_text = ""
    should_end_session = False

    # visual responses
    primary_text = ''  # TODO
    secondary_text = ''  # TODO

    return build_response(session_attributes,
                          build_speechlet_response(card_title, speech_output, reprompt_text,
                                                   should_end_session, build_audio_response()))


def create_new_user():
    """ Creates a new user that the server will recognize and whose action will be stored in db """
    url = "http://XXXXXX:XXXX/create_user"
    response = urllib.request.urlopen(url)
    data = json.loads(response.read().decode('utf8'))
    uuid = data["uuid"]
    return ask_question(uuid)


def query_server(uuid):
    """ Requests to get num_questions number of questions from the server """
    url = "http://XXXXXXXX:XXXX/get_question_json?uuid=%s" % (uuid)  # TODO : change needs to to be uuid
    response = urllib.request.urlopen(url)
    data = json.loads(response.read().decode('utf8'))

    if data["status"]:
        question = data["data"]["question"]
        quid = data["data"]["quid"]
        next_quid = data["data"]["next_quid"]  # TODO : will we need any of this?
        topic = data["data"]["topic"]
        type = data["data"]["type"]
        media_type = data["data"]["media_type"]  # either 'IMAGE', 'AUDIO', or 'VIDEO'
        answers = data["data"]["answer"]  # list of answers stored in order they should be spoken
        images = data["data"]["image"]  # list of images that correspond to order of answers list
        audio = data["data"]["audio"]
        video = data["data"]["video"]

        question_data = {"status": True, "data":{"question": question, "quid": quid, "answers": answers,
                         "media_type": media_type, "images": images, "audio": audio, "video": video}}
        if next_quid is "None":
            return None
        return question_data
    else:
        return {"status": False}


def ask_question(uuid):
    """ Returns a quiz question to the user since they specified a QuizIntent """
    question_data = query_server(uuid)

    if question_data is None:
        return get_session_end_response()

    card_title = "Ask Question"
    speech_output = ""
    session_attributes = {}
    should_end_session = False
    reprompt_text = ""

    # visual responses
    display_title = ""
    primary_text = ""
    secondary_text = ""

    images = []
    answers = []

    if question_data["status"]:
        session_attributes = {
            "quid": question_data["data"]["quid"],
            "uuid": "df876c9d-cd41-4b9f-a3b9-3ccd1b441f24",
            "question_start_time": time.time()
        }

        question = question_data["data"]["question"]
        answers = question_data["data"]["answers"]  # answers are shuffled when pulled from server
        images = question_data["data"]["images"]
        # TODO : consider different media types

        speech_output += question
        reprompt_text += ("Please choose an answer using the official NATO alphabet. For example," +
                          " A is alpha, B is bravo, and C is charlie.")

    else:
        speech_output += "Oops! This is embarrassing. There seems to be a problem with the server."
        reprompt_text += "I don't exactly know where to go from here. I suggest restarting this skill."

    return build_response(session_attributes, build_speechlet_response(card_title, speech_output,
            reprompt_text, should_end_session,
             build_display_response_list_template2(title=question, image_urls=images, answers=answers)))


def send_quiz_responses_to_server(uuid, quid, time_used_for_question, answer_given):
    """ Sends the users responses back to the server to be stored in the database """
    url = ("http://XXXXXXXX:XXXX/send_answers?uuid=%s&quid=%s&time=%s&answer_given=%s" %
          (uuid, quid, time_used_for_question, answer_given))
    response = urllib.request.urlopen(url)
    data = json.loads(response.read().decode('utf8'))
    return data["status"]


def get_answer_response(slots, attributes):
    """ Returns a correct/incorrect message to the user depending on their AnswerIntent """

    # get time, quid, and uuid from attributes
    question_start_time = attributes["question_start_time"]
    quid = attributes["quid"]
    uuid = attributes["uuid"]

    # get answer from slots
    try:
        answer_given = slots["Answer"]["value"].lower()
    except KeyError:
        return get_session_end_response()

    # calculate a rough estimate of the time it took to answer question
    time_used_for_question = str(int(time.time() - question_start_time))

    # record response data by sending it to the server
    send_quiz_responses_to_server(uuid, quid, time_used_for_question, answer_given)

    return ask_question(uuid)


def get_help_response():
    """ Returns a help message to the user since they called AMAZON.HelpIntent """
    session_attributes = {}
    card_title = ""
    speech_output = "" # TODO
    reprompt_text = "" # TODO
    should_end_session = False

    return build_response(session_attributes,
            build_speechlet_response(card_title, speech_output, reprompt_text, should_end_session,
             build_display_response(utility_background_image, card_title)))


# --------------- Helpers that build all of the responses ----------------------


def build_hint_response(hint):
    """
    Builds the hint response for a display.

    For example, Try "Alexa, play number 1" where "play number 1" is the hint.
    """
    return {
        "type": "Hint",
        "hint": {
            "type": "RichText",
            "text": hint
        }
    }


def build_display_response(url='', title='', primary_text='', secondary_text='', tertiary_text=''):
    """
    Builds the display template for the echo show to display.

    Echo show screen is 1024px x 600px

    For additional image size requirements, see the display interface reference.
    """
    return [{
        "type": "Display.RenderTemplate",
        "template": {
            "type": "BodyTemplate1",
            "token": "question",
            "title": title,
            "backgroundImage": {
                "contentDescription": "Question",
                "sources": [
                    {
                        "url": url
                    }
                ]
            },
            "textContent": {
                "primaryText": {
                    "type": "RichText",
                    "text": primary_text
                },
                "secondaryText": {
                    "type": "RichText",
                    "text": secondary_text
                },
                "tertiaryText": {
                    "type": "RichText",
                    "text": tertiary_text
                }
            }
        }
    }]


def build_list_item(url='', primary_text='', secondary_text='', tertiary_text=''):
    return {
        "token": "question_item",
        "image": {
            "sources": [
                {
                    "url": url
                }
            ],
            "contentDescription": "Question Image"
        },
        "textContent": {
            "primaryText": {
                "type": "RichText",
                "text": primary_text
            },
            "secondaryText": {
                "text": secondary_text,
                "type": "PlainText"
            },
            "tertiaryText": {
                "text": tertiary_text,
                "type": "PlainText"
            }
        }
    }


def build_display_response_list_template2(title='', image_urls=[], answers=[]):
    list_items = []
    for image, answer in zip(image_urls, answers):
        list_items.append(build_list_item(url=image, primary_text=answer))

    return [{
        "type": "Display.RenderTemplate",
        "template": {
            "type": "ListTemplate2",
            "token": "question",
            "title": title,
            "backgroundImage": {
                "contentDescription": "Question Background",
                "sources": [
                    {
                        "url": "https://i.imgur.com/HkaPLrK.png"
                    }
                ]
            },
            "listItems": list_items
        }
    }]


def build_audio_response(url): # TODO add a display repsonse here as well
    """ Builds audio response. I.e. plays back an audio file with zero offset """
    return [{
        "type": "AudioPlayer.Play",
        "playBehavior": "REPLACE_ALL",
        "audioItem": {
            "stream": {
                "token": "audio_clip",
                "url": url,
                "offsetInMilliseconds": 0
            }
        }
    }]


def build_speechlet_response(title, output, reprompt_text, should_end_session, directive=None):
    """ Builds speechlet response and puts display response inside """
    return {
        'outputSpeech': {
            'type': 'PlainText',
            'text': output
        },
        'card': {
            'type': 'Simple',
            'title': title,
            'content': output
        },
        'reprompt': {
            'outputSpeech': {
                'type': 'PlainText',
                'text': reprompt_text
            }
        },
        'shouldEndSession': should_end_session,
        'directives': directive
    }


def build_response(session_attributes, speechlet_response):
    """ Builds the complete response to send back to Alexa """
    return {
        'version': '1.0',
        'sessionAttributes': session_attributes,
        'response': speechlet_response
    }

UPDATE 3: I updated the intents so there is now one custom intent that takes a custom slot, and then I have another custom intent that takes no slots. These custom intents also have there own sample utterances. Both the intents and their utterances are listed below. When I start the skill, it works fine. Then when I say/type "zoo zoo zoo" to test bad input, I get an error. Both the request for "zoo zoo zoo" and the response are listed below. I am looking for a good way to catch this bad input error and resume/revert the skill back to its previous state.

Intents:

...
{
      "intent": "TestAudioIntent"
},
{
      "slots": [
        {
          "name": "Answer",
          "type": "LETTER"
        }
      ],
      "intent": "AnswerIntent"
},
...

Sample Utterances:

AnswerIntent {Answer}
AnswerIntent I think it is {Answer}
TestAudioIntent test the audio

Example JSON request:

{
  "session": {
    "new": false,
    "sessionId": "SessionId.574f0b74-be17-4f79-bbd6-ce926a1bf856",
    "application": {
      "applicationId": "XXXXXXXX"
    },
    "attributes": {
      "quid": "7fa9fcbf-35db-4bbd-ac73-37977bcef563",
      "question_start_time": 1515691612.7381804,
      "uuid": "df876c9d-cd41-4b9f-a3b9-3ccd1b441f24"
    },
    "user": {
      "userId": "XXXXXXXX"
    }
  },
  "request": {
    "type": "IntentRequest",
    "requestId": "EdwRequestId.23765cb0-f327-4f52-a9a3-b9f92a375a5f",
    "intent": {
      "name": "TestAudioIntent",
      "slots": {}
    },
    "locale": "en-US",
    "timestamp": "2018-01-11T17:26:57Z"
  },
  "context": {
    "AudioPlayer": {
      "playerActivity": "IDLE"
    },
    "System": {
      "application": {
        "applicationId": "XXXXXXXX"
      },
      "user": {
        "userId": "XXXXXXXX"
      },
      "device": {
        "supportedInterfaces": {
          "Display": {
            "templateVersion": "1",
            "markupVersion": "1"
          }
        }
      }
    }
  },
  "version": "1.0"
}

And I get the following testing error as a response:

The remote endpoint could not be called, or the response it returned was invalid.

You've only shown the function definition. You will need to show the code that calls the function in order to troubleshoot. — LegendaryDude, Jan 11 '18 at 16:29
@LegendaryDude I have added my code above in update 2. Sorry about that, and thank you for taking the time to consider my question. — peachykeen, Jan 11 '18 at 16:45
You may also want to include an example request in json format, so that we can verify that you are referring to the correct key(s) in your lambda_handler function. — LegendaryDude, Jan 11 '18 at 17:05
@LegendaryDude I have added these as well in update 3 above. — peachykeen, Jan 11 '18 at 17:34
Based on your sample JSON, I'm afraid I don't understand the problem enough. Is that the JSON you send to Lex, or what you're expecting to receive from Lex? As far as I know it doesn't match either of Lex's `Input Event` or `Response` JSON structures. Check this out for details: https://docs.aws.amazon.com/lex/latest/dg/lambda-input-response-format.html — LegendaryDude, Jan 11 '18 at 21:04
That is the JSON that is used as an input to Alexa which I took from Amazon's mini testing console. Essentially, I see myself as having two problems which are related to one another. When I make an intent that has no slots, all random utterances seem to be handled by that intent regardless of whether they are similar to the intent's pre-defined sample utterance or not. This means that when give the Alexa random input, it goes and creates an error because it treats the input like it should be handled by the intent with no slots. This is what I am thinking is causing the error currently ... — peachykeen, Jan 11 '18 at 21:58

score 0 · Accepted Answer · answered Jan 18 '18 at 09:15

What I ended up doing is using something similar to Amazon's dialogue management system. If a user says something that doesn't fill a slot, I re-prompt them with that question. My goal is to record a user's statements/answers after each time they speak, thus I didn't use the built-in dialogue management. Additionally, I used Amazon's slot synonyms for all my slots to make my modal more robust.

I still don't know that this is the best way, but it is a starting point and seems to work O.K....

How can the Amazon Echo catch errors?

1 Answers1