Build a Voice User Interface for your business with Alexa Skills Kit

12 min readNov 7, 2022

Photo by Franco Antonio Giovanella on Unsplash

Voice User Interface (VUI) makes it possible for humans to interact with a device or an app through voice commands. The machine through speech recognition can convey commands and questions into actions and answers. VUIs are natural, conversational, and user-centric. They make everyday tasks faster and more efficient by using voice instead of keyboards, mouse clicks, or touches. According to a Google article, 41% of people who own a voice-activated speaker feel like talking to another person. Big companies saw the potential and invested in the technology of voice recognition and natural language understanding. We can name a few, like Apple’s Siri, Microsoft’s Cortana, Google Assistant, and Amazon's Alexa.

In this article, I’m going to share with you how you can build your own VUI for your business application using Alexa Skill Kit.

What is Alexa

Alexa is Amazon’s cloud-based voice service available on more than 100 million devices from Amazon and third-party device manufacturers. There are two sides to Alexa:

The first side is the Alexa Voice Service, all you need is a Wi-Fi chip and a microphone and any device can become Alexa-enabled. There are a lot of third-party device manufacturers that are also Alexa-enabling their own devices, so it’s not just Echos that can connect to Alexa.
The other side to Alexa is the Alexa Skills Kit (ASK), which is a software development framework that enables you to create content, called skills. Skills are like apps for Alexa. Skills are capabilities that developers give Alexa to be able to do different things.

To open a skill, you need to call the Alexa wake word, and a launch word such as ‘open’, ‘start’, and ‘ask’. Let’s imagine that we want to open a Skill called ‘Space Facts’. In this case name ‘space facts’ is what is called the invocation name, it is a name that you decide to give your own skill which users will need to say in order to open that specific skill, just like you would open an app on your phone called space facts, only now, using voice not clicks.

You can also access a specific functionality of the Skill. you can give a specific command using one-shot utterance. So for example: ‘Alexa, ask space facts for a fact’.

‘for a fact’ is the utterance, so it is what users can say either when they open the skill, or when the skill welcomes them and asks what they want to do. These utterances map to intents. There are various ways of saying the same thing, but all those ways mean the same intent, and that is the foundation of building Alexa Skill.

You don’t have to worry about specific sentences, you just have to worry about intentions. As an Alexa Skill developer, you need to decide what intents your skill supports. So for example, giving a fact, planning a trip, or quitting or starting a new game …. and you need to give some examples sentences that kind of symbolize this intent. And then it’s up to Alexa to route the user with the right intent. In this case, we have a GetNewFactIntent, which is all sorts of different phrases that can trigger your skill, such as ‘tell me a fact, ‘give me a fact’, or ‘simple fact’.

That would not be enough, sometimes, the users need to provide extra information. For example in this sentence “Alexa, ask space facts for a fact about Mars”. I’m asking for a specific fact about Mars. And it’s called a slot.

A slot is the Alexa voice equivalent of a variable. And a variable is a piece of the user’s utterance that you need as a developer to fulfill that specific intent. You can have more than one slot in your intent, and users can specify more than one slot in their utterances. In this case, we want a slot called ‘planet’, because we want to be able to give a fact about a specific planet. Slots can be built-in, which Amazon has pre-trained on various models, such as animals, some names, and cities, or they could be custom and made by you.

So, just a recap, we have:

Wake word: The wake word tells Alexa to start listening to your commands.
Launch word: A launch word is a transitional action word that signals to Alexa that a skill invocation will likely follow. Examples of launch words include “tell”, “ask”, “open”, “launch”, and “use”.
Invocation name: To begin to interact with a skill, a user says the skill’s invocation name. For example, to use the Daily Horoscope skill, the user could say, “Alexa, open my daily horoscope.”
Utterance: An utterance is a user’s spoken request. These spoken requests can invoke a skill, provide inputs for a skill, confirm an action for Alexa, and so on. Consider the many ways your users could form their requests.
Prompt: A string of text that you have Alexa speak to the user to ask for information. You include the prompt text in your response to a user’s request.
Intent: An intent represents an action that fulfills a user’s spoken request. Intents can optionally have arguments called slots.
Slot value: Slots are input values provided in a user’s spoken request. These values help Alexa figure out the user’s intent.

In the following example, the user opens a travel skill and gives input information: the travel date of Friday. This value is a slot value for a slot in a defined intent, which Alexa passes on to the Lambda function for skill code processing.

Slots can be defined with different types. The travel date slot uses Amazon’s built-in AMAZON.DATE type to convert words that indicate dates (such as "today" and "next Friday") into a date format, while both from city and to city use the built-in AMAZON.US_CITY slot.

There are two sides to building Alexa Skill. The first side is the Voice interaction Model. it is where you define your skill name, your invocation name, your intents, your slots, your utterances, and basically everything that has to do with a voice interface. The backend is where you implement and handle all of these intentes.

This is the overall architecture of how Alexa works:

Build your first Alexa Skill

This is an MVP scenario to showcase the possibilities of Voice User Interface (VUI) with Alexa Skills Kit (ASK). Let’s build a Skill on the Alexa-Hosted Skills, everything will be hosted and managed for us. But you can have your backend if you want.

The folder structure is laid out as follows:

We have a models folder, where we have all our interaction models, all of our intents, and utterances, represented in a JSON file for each language that we choose to support. And then we have a Lambda folder, where our backend code is hosted. When you use the CLI (command line interface), all of these folders are localized along with a skill.json manifest, and you can version control and manage it in your directory. In our case, we will be using the browser, which separates the interaction model and the code in two locations.

Step 1. Design: ‘Space Fact’

Your job as a skill builder is to make the interaction between your skill and the user simple and natural. One way to do this is to mimic human conversational patterns. You want your skill to adapt interactions based on the experience of the user. Otherwise, the user can become frustrated. We will build the Space fact skill that gives space facts from a list of facts.

Step 2 : Build:

First, this is to Sign in to the developer.amazon.com console with your Amazon developer account. Create an account if you don’t have one.

Then click on the create skill and add the name of the skill, ‘space fact’ in our case.

Go with the default ‘Custom’ model, ‘Alexa-hosted (node.js)’ hosting option, and ‘start from scratch’ template:

You will end up with this interface, which contains different configurations for your custom skill.

Let’s first change the invocation name to ‘space fact’ and save the model:

Now let’s add new intent called ‘spaceFacts’:

Here, I have added a new intent named spaceFacts along with some sample utterances such as ‘give me a space fact’, and ‘tell me a space fact’. No slots and custom slot types are defined for this Alexa skill. I wanted to keep it simple for this tutorial.

After creating a model for a particular skill, we can save and build the model by clicking on the save model and build model button on the top.

In your JSON editor you will see the list of intent samples:

/* *
 * This sample demonstrates handling intents from an Alexa skill using the Alexa Skills Kit SDK (v2).
 * Please visit https://alexa.design/cookbook for additional examples on implementing slots, dialog management,
 * session persistence, api calls, and more.
 * */
const Alexa = require('ask-sdk-core');const data = [  
      'space is completely silent, there is no atmosphere in space, which means that sound has no medium or way to travel to be heard.',  
      'THE HOTTEST PLANET IN OUR SOLAR SYSTEM IS 450° C.',  
      'THE SUN’S MASS TAKES UP 99.86% OF THE SOLAR SYSTEM.',  
      'THERE ARE MORE TREES ON EARTH THAN STARS IN THE MILKY WAY.',  
      'THERE IS A PLANET MADE OF DIAMONDS.',  
      'The space between galaxies is not completely empty but has an average of one atom per cubic meter.',  
      'Spacecraft have visited all the known planets in our solar system.',  
];const GET_FACT_MESSAGE = "Here is a fact " ;const LaunchRequestHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'LaunchRequest';
    },
    handle(handlerInput) {
        const speakOutput = 'Welcome to space fact. you can ask me about a space fact';return handlerInput.responseBuilder
            .speak(speakOutput)
            .reprompt(speakOutput)
            .getResponse();
    }
};const SpaceFactHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && Alexa.getIntentName(handlerInput.requestEnvelope) === 'spaceFacts';
    },
    handle(handlerInput) {
        const factArr = data;  
        const factIndex = Math.floor(Math.random() * factArr.length);  
        const randomFact = factArr[factIndex];  
        const speechOutput = GET_FACT_MESSAGE + randomFact;  
  
  
        const speechText = speechOutput;  
        return handlerInput.responseBuilder  
            .speak(speechText)  
            .reprompt(speechText)  
              
            .getResponse();
    }
};const HelpIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.HelpIntent';
    },
    handle(handlerInput) {
        const speakOutput = 'You can say hello to me! How can I help?';return handlerInput.responseBuilder
            .speak(speakOutput)
            .reprompt(speakOutput)
            .getResponse();
    }
};const CancelAndStopIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && (Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.CancelIntent'
                || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.StopIntent');
    },
    handle(handlerInput) {
        const speakOutput = 'Goodbye!';return handlerInput.responseBuilder
            .speak(speakOutput)
            .getResponse();
    }
};
/* *
 * FallbackIntent triggers when a customer says something that doesn’t map to any intents in your skill
 * It must also be defined in the language model (if the locale supports it)
 * This handler can be safely added but will be ingnored in locales that do not support it yet 
 * */
const FallbackIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.FallbackIntent';
    },
    handle(handlerInput) {
        const speakOutput = 'Sorry, I don\'t know about that. Please try again.';return handlerInput.responseBuilder
            .speak(speakOutput)
            .reprompt(speakOutput)
            .getResponse();
    }
};
/* *
 * SessionEndedRequest notifies that a session was ended. This handler will be triggered when a currently open 
 * session is closed for one of the following reasons: 1) The user says "exit" or "quit". 2) The user does not 
 * respond or says something that does not match an intent defined in your voice model. 3) An error occurs 
 * */
const SessionEndedRequestHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'SessionEndedRequest';
    },
    handle(handlerInput) {
        console.log(`~~~~ Session ended: ${JSON.stringify(handlerInput.requestEnvelope)}`);
        // Any cleanup logic goes here.
        return handlerInput.responseBuilder.getResponse(); // notice we send an empty response
    }
};
/* *
 * The intent reflector is used for interaction model testing and debugging.
 * It will simply repeat the intent the user said. You can create custom handlers for your intents 
 * by defining them above, then also adding them to the request handler chain below 
 * */
const IntentReflectorHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest';
    },
    handle(handlerInput) {
        const intentName = Alexa.getIntentName(handlerInput.requestEnvelope);
        const speakOutput = `You just triggered ${intentName}`;return handlerInput.responseBuilder
            .speak(speakOutput)
            //.reprompt('add a reprompt if you want to keep the session open for the user to respond')
            .getResponse();
    }
};
/**
 * Generic error handling to capture any syntax or routing errors. If you receive an error
 * stating the request handler chain is not found, you have not implemented a handler for
 * the intent being invoked or included it in the skill builder below 
 * */
const ErrorHandler = {
    canHandle() {
        return true;
    },
    handle(handlerInput, error) {
        const speakOutput = 'Sorry, I had trouble doing what you asked. Please try again.';
        console.log(`~~~~ Error handled: ${JSON.stringify(error)}`);return handlerInput.responseBuilder
            .speak(speakOutput)
            .reprompt(speakOutput)
            .getResponse();
    }
};/**
 * This handler acts as the entry point for your skill, routing all request and response
 * payloads to the handlers above. Make sure any new handlers or interceptors you've
 * defined are included below. The order matters - they're processed top to bottom 
 * */
exports.handler = Alexa.SkillBuilders.custom()
    .addRequestHandlers(
        LaunchRequestHandler,
        SpaceFactHandler,
        HelpIntentHandler,
        CancelAndStopIntentHandler,
        FallbackIntentHandler,
        SessionEndedRequestHandler,
        IntentReflectorHandler)
    .addErrorHandlers(
        ErrorHandler)
    .withCustomUserAgent('sample/hello-world/v1.2')
    .lambda();

Request handlers are created for each intent to handle. Inside each handler, canhandle() and handle functions are defined.

The canHandle() function is where you define what requests the handler responds to. The handle() function returns a response to the user. If your skill receives a request, the canHandle() function within each handler determines whether or not that handler can service the request.

In this case, the user wants to launch the skill, which is a LaunchRequest. Therefore, the canHandle() function within the LaunchRequestHandler will let the SDK know it can fulfill the request. In computer terms, the canHandle returns true to confirm it can do the work.

With the help of LaunchRequestHandler, Alexa greets the user by saying welcome to space fact and tells the user that he/she ask Alexa about random space facts. She also asks the user if he/she wants to know any random facts about space items.

After that, SpaceFactHandler is defined to handle each and every request of the user to know a random space fact. This handler will enable Alexa to tell users a random space fact and also asks the user if he/she wants to know about any more facts. This list of facts is in the array called data.

As we can see from the above code, inside the handle() function an array name containing some random facts about space items are defined. With the help of math.random function, a fact is extracted at random from the array and then displayed to the user by Alexa.

After you save your changes and build the model, you can test it via the Test interface:

As we can see from the output above, to invoke a skill, the user can say open followed by the invocation name. Here space facts is an invocation name to invoke this skill. Alexa then greets the user by saying welcome to space fact and tells the user that he/she ask Alexa about random space facts.

Users can ask Alexa for a random fact by saying get me a space fact, which is a sample utterance defined for the skill. In response, Alexa tells a random fact about any space fact item.

In this article, I gave an introduction to Alexa skills, and I also walked you step by step to create a space facts skill. We defined the invocation name, intents, and sample utterances.

Enjoy 🖖.

References:

https://developer.amazon.com/en-GB/alexa/alexa-skills-kit