How to build Alexa Custom Skill Using Python

Developing for a voice platform is predictably very different from developing for a mobile or a web interface. To learn more about this topic, I decided to build a simple voice application for the biggest voice platform today – Alexa. More specifically, a custom Alexa skill using Python.

This project came to fruition after an Alexa Skills hackathon where my skill placed third. The skill is called My Ballot and it tells users election dates, candidates, and polling stations in their area. The skill has passed certification and is waiting to be published in the Alexa Skills store. The dev version includes taking donations for a candidate using In Skill Purchasing, but because of policy issues it is excluded from the live version. You can find the source code for the live version in my Github page.

Overview

There are two major component in any skill:

  • The voice interaction model. It’s mostly defined in a single JSON file and defines the entire scope of interactions that a user can have with a skill. This file is the architectural plan to the skill you plan to build. It defines items such as what kind of input/argument the your skill can accept (slots) under a given scenario (called intents). This file will also include samples of utterance texts that tells Alexa to trigger a specific intent you have defined. These resources on the interaction model and intents/slots should give a good overview.
  • The Lambda function. I decided to host My Ballot on AWS Lambda. Not only is it recommended in the developer documentation itself, it also provides template functions specifically designed for Alexa skills.
AWS Lambda has template functions for Alexa Skills

Other backend elements: My Ballot uses Google Civics API for upcoming election dates and other polling details for a location. The REST API is fairly easy to use: once you gain a token key, you can use it to input an address and get all the details in a json output.

Building the Skill

To build a skill,  you need to use the Alexa Skills Kit SDK. It’s available in five languages: JavaScript, Python, Java, C#, and TypePython. These SDKs are available in the Alexa Github page. I used Python, though most of the documentation centers around JavaScript. If you’re a complete beginner, I recommend using a template Skill available in the Github page to understand the various backend elements that make a custom skill. For example, I picked the City Guide template that came with instructions to understand where all the components fit. A word of caution though – these instructions are not updated regularly and I personally found many steps either out of sequence or completely skipped.

Instructions in the Skill template

1. Set-up

Before any actual coding, there are a few items that need to be completed. First, set up the ASK CLI and create a skill either via the command line or in the ASK Developer console. Second, create a Lambda function in the AWS console, using one of the Skills template/blueprint functions. Use the function’s ARN in the Endpoint section within the ASK Developer Console:

Using a Lambda function as the Skill endpoint

Finally, once you are able to access your skill via command line (to download a skill’s core folders to your local folder, use ask clone in the CLI), you are ready to code. Make sure your folder structure follows the structure in the official Skill templates. You should have a Models folder for your interaction model and a lambda folder under which the ASK SDK core files and your lambda function (named lambda_function.py) should live.

2. Creating the Voice Interaction Model

The first step to developing a skill is creating the voice interaction model following the required schema. The model is stored in a JSON file inside the Models folder. The JSON file should be named with the intended locale and language. For example, if you expect your users to speak in the UK and speak English, the file should be named en_GB. This file can also be edited/viewed in the ASK Developer console under “JSON Editor”. By default, the model already contains essential intents like YesIntent, HelpIntent, FallbackIntent, etc. It needs to be expanded to include elements (intents, types, slots, etc) that you need for your skill. For example, for My Ballot, I have an intent that asks users their addresses. The intent definition includes the argument it can take (type: AMAZON.PostalAddress) and samples of responses to that intent:

{
    "name": "AddressIntent",
    "slots": [
        {
            "name": "address",
            "type": "AMAZON.PostalAddress"
        }
    ],
    "samples": [
        "address",
        "here's my address",
        "{address}"
    ]
}

While ideally you want to get the Voice Interaction Model right before coding the skill, you can expect to come back and add or tweak elements throughout the Skill building process.

2. Build the Lambda function

The lambda_function.py file that will eventually go to your Lambda function is where you define and implement your intents. There are three essential components that should be implemented:

  • The LaunchHandler class: This is the part that gets executed when a user opens the skill. This is where the welcome message goes. And like in my case, you can also use it to ask essential information needed for the skill (e.g. an address).

[code language=”python”]

class LaunchHandler(AbstractRequestHandler):
“””Handler for Skill Launch.”””
def can_handle(self, handler_input):
return (is_request_type(“LaunchRequest”)(handler_input))

def handle(self, handler_input):
logger.info(“In LaunchHandler”)
apiAccessToken = handler_input.request_envelope.context.system.api_access_token
speech = “Welcome to my ballot. Please specify a U.S. address to learn about elections near you.”
handler_input.response_builder.speak(speech).ask(“Please specify an address”)
return handler_input.response_builder.response

[/code]

  • An intent handler class for every intent defined in the voice interaction model: Every intent defined in the voice interaction model needs to have a handler class. Similarly, each handler class should have a can_handle method to check if the specific intent can handle a user request. Below is the AddressIntent handler class that takes a customer’s address (it’s stored as a slot named “address”, as defined in the interaction model) and stores it as a session attribute so that it can be accessed by other intents. More on session attributes here.

[code language=”python”]

class AddressIntentHandler(AbstractRequestHandler):
“””Handler for GetAddress Intent. This handles the case when user gives their address.”””
def can_handle(self, handler_input):
return (is_intent_name(“AddressIntent”)(handler_input))

def handle(self, handler_input):
logger.info(“In AddressIntentHandler”)
#get address that the user provided
address = handler_input.request_envelope.request.intent.slots[“address”].value
session_attr = dict({‘address’:address})
#save it to the session attributes
handler_input.attributes_manager.session_attributes = session_attr
speech = “Thank you. What would you like to hear? You can say upcoming elections, polling stations, or candidates.”
handler_input.response_builder.speak(speech).ask(speech)
return handler_input.response_builder.response

[/code]

  • Registering each intent handler: Every intent handler class should be registered at the end. The order in which the handler classes are registered determines the order used to check handlers for a user request.

[code language=”python”]

sb.add_request_handler(LaunchHandler())
sb.add_request_handler(AddressIntentHandler())

[/code]

3. Deploying and Testing

There are several versions of deploy commands that you can use with the CLI, depending on the changes you’ve made to your code. This is helpful because deploy all usually takes several minutes to complete.

Testing an Alexa skill happens in two places: the ASK developer console and the AWS console.

  • The ASK Developer console: The “Test” section of the ASK console provides a simulation platform for developers to test their skills using an actual utterance text or speech. It shows input, output, and device logs. This place tells you if your skill is actually working. Using the input/output and device logs, you can also see if the code is working properly (for example if the intent you expect to be invoked is invoked, or if an attribute is being sent as a session attribute).
The Test section is ASK Console
  • The AWS Lambda console: In the Lambda function page in your AWS console, you can create test events and check code logs. I found this helpful when I wanted to test a specific intent or when I needed to determine the specific bug location in the code. When configuring a test event, you can use the JSON input log that you get while testing on the ASK Developer console. More information on test events can be found here.
Configuring a test event in AWS console

4. Certification and Publishing

Your skill needs to go through a certification process before getting published. The Distribution tab in the ASK Developer console allows developers to specify Skill display icons, description, and distribution locale. The Certification tab allows developers to perform basic functional testing and submit their skills for certification and publication. The Certification guideline has a checklist that every skill needs to adhere to. Once you have submitted the Skill for certification, the Alexa Skills team will usually respond with a decision within 15 days.

Leave a Reply