Recently I published my first Alexa Skill, Promis Proctor and I wanted to share a few quick lessons learned I learned from the development and certification process. I wrote the skill in Node.js using the ASK CLI and the Alexa SDK.
Developing an Alexa Skill turned out to be an interesting challenge largely due to the Voice User Interface (VUI). The basics of developing a skill are fairly straightforward (the experience is somewhat similar to developing a chatbot):
- Design a VUI
- Implement a lambda function (or server) that processes parsed VUI messages
The Voice User Interface is defined within the Alexa Skill Console: when a user invokes your skill Amazon will use the interaction model you define to parse the user’s utterance and map it to your skill’s intents. Interaction with your skill occurs primarily through custom and Amazon defined intents, each of which represent some user action/request. In the VUI you’ll define a number of sample utterances that a user can speak to invoke that specific intent. To accept arguments from a user you define slots within your sample utterances - slot types may be officially supported Alexa Slots, for things such as names and places, or a Custom Slot type that you define.
The Lambda Function (or server) is a handler that is invoked with a formatted Alexa VUI JSON message containing the parsed user’s utterance. The handler is responsible for processing each event/request and responding with a JSON formatted response with that contains the appropriate Speech Synthesis Markup Language (SSML) text to be spoken to the user. If you use the Alexa SDK most of this will be abstracted for you.
Here’s what I learned while implementing my skill:
- Consider using the Alexa Skill Kit CLI for your project if using Node.js. Initially I developed the skill by editing the VUI in the Alexa Console Interaction Builder and by manually uploading the Lambda Function code. Using the ASK CLI made the process of editing and deploying the Alexa Skill much smoother. One pain point I found was that the ASK CLI didn’t seem to have an option to pull in changes from the Interaction Builder tool, which made modifying the skill’s interaction modal slightly awkward. To work around this I ended up developing the Interaction Model in the console and copying the Interaction Model from the JSON Editor to my local project. That said, it’s possible this has changed since I started the project (the CLI has had a series of updates over the last few months).
- Custom Slots are not enumerated types; instead they act as machine learning examples for the Alexa VUI to attempt to fill the slot. See the Useful Links section for more details on this.
- Alexa SDK State Handlers are a little bit of a mixed bag. While the use of State Handlers made it easier to organize my application’s code they did add a bit of overhead. Initially I started by using three states (Default, Start and Administration), but ended up combining the Start and Default states to reduce the number of forwarding intents.
- The Alexa SDK routes requests to StateHandler Intents by concatenating the session state to the Intent. For example if you have a
_PLAYStateHandler and a GuessIntent defined in that handler, then the Alexa SDK will actually route remap the VUI Intent to
GuessIntent_PLAY. If you want to handle common intents like AMAZON.HelpIntent or AMAZON.CancelIntent you’ll need to redefine these in each of your StateHandlers.
- You can forward requests to other Intent Handlers using emit(INTENT_NAME) or emitWithState(INTENT_NAME). I found this useful for navigation or error flows.
- Use emitWithState to forward to Intents in the current StateHandler (or to another StateHandler if you update the state). If you use emit without a state it will forward to the default state handler.
- Implement the Unhandled handler to handle invalid bad user requests.
- Alexa SDK accepts at most one response message. You can’t invoke the speak method multiple times to concatenate messages together.
- The NewSession Handler intercepts any intent processing for the state. See the Alexa SDK for more details.
- There didn’t seem to be an obvious way to add event/request middleware in the Alexa SDK. My guess is that you could implement this logic in your skill entry point (index.handler), but I haven’t experimented with this yet.
- The Alexa SDK uses the i18next library for internationalization. You can interpolate values into your strings using a handlebars style syntax and a context object.
- Clearing State in the Alexa SDK is currently a bit awkward. See this GitHub issue.
The certification process for an Alexa Skill was surprisingly painless (and free!), especially when your skill as simple as Promis Proctor. In general, the certifiction process aims to verify that your skill functions correctly and provides a reasonably robust user experience.
Here’s what I learned:
- Make sure your example utterances are supported by your skill and are formatted correctly.
- Test error paths in your applications extensively. If you’re using the State Handler feature of the Alexa SDK, it’s especially easy to miss paths.
While your experience may vary, it took me about a week to get the skill certified, including a rejection for listing an unsupported example phrase and not properly supporting synonyms.
No personal project is ever entirely done. While Promis Proctor is more of a proof of concept than anything, there are still a few areas I’d like to explore further:
- Improve synonym support. At the time of publishing I didn’t fully understand how the Alexa VUI dealt with synonyms defined in the custom slot. I’d probably dig more deeply into the slot values and leverage the ER_SUCCESS_MATCH/ER_NO_MATCH flags instead of handling synonyms in my code.
- Flesh out testing support. Some of the sample Alexa Skills include examples of manually written testing harnesses. I’d like to further explore options for testing the VUI portion of the application without manually generating the VUI JSON responses.
- Explore management of Production/Development skills. After your skill passes ceritifcation you’ll end up with two skills under your Alexa Skill Console. I’d like to further explore ASK CLI/Alexa Console options for updating development and production skills.
Here are a few links that may prove useful if you’re also starting out with an Alexa Skill:
Got any suggestions? Anything you’d like to add? Let me know below!