Creating an Alexa Audio Streaming Skill with api.audio

Matt Lehmann
10 min readSep 9, 2021

This article explains how you can create a simple Alexa Skill from scratch using api.audio content. First, we will create a personalized newscast and then, build an Alexa skill that is able to play the audio files generated using api.audio.

Introduction

In this blog post we will create a simple Alexa Skill from scratch using api.audio content. The focus of this blogpost is to give a detailed overview on how to create alexa skills, and how to connect them with an external api (api.audio) to produce and retrieve audio content based on user’s request. Finally, we will also explain how to test it using the Alexa Developer Console, the Alexa phone App, and your Echo devices. Special thanks to VozLab for collaborating with us on this. More about VozLab can be found at the end of this article.‍

First, we will review and install all the necessary tools we need, then we will create audio content using api.audio, in this case a personalized newscast. Finally, we will connect the Alexa skill with the api.audio API, deploy and test with an Echo device and the Alexa App.‍

Preparing the tool

‍First of all, let’s check we have all the tools before we start:‍

1. Api.audio Account — Create an account in api.audio Console.‍

2. Alexa Developer Account — Create an account in the Alexa Developer Console.‍

3. Amazon Web Services (AWS) Account- Create an account in the AWS Console.‍

4. IAM user — Login into your AWS Account and search for the IAM service. Create an IAM user with the necessary permissions and at least programmatic access. If you don’t know how to do that, don’t panic and follow this simple guide. Make sure you copy your AWS Access Key ID and your AWS Secret Access Key. You will need it for the next step.‍

5. AWS CLI v2 — Download the AWS Command Line interface. Once it’s installed, just do:

‍aws configure‍

aws configure
# AWS Access Key ID []: your-iam-user-access-key-id
# AWS Secret Access Key []: your-iam-user-secret-access-key
# Default region name []: eu-west-1
# Default output format []: json
# now test if aws cli is configured correctly.
aws sts get-caller-identity
# {
# "UserId": "...",
# "Account": "...",
# "Arn": "arn:aws:iam::...:user/youriamusername"
# }

6. node.js & npm — Download it from here

7. Alexa Skills Kit (ASK) SDK v2 — Install your preferred ASK SDK, but for this tutorial we will use the SDK v2 for node.js. At the moment of writing, there are 3 SDK’s available: python, javascript (node.js) and php. To install the javascript SDK just do:‍

npm install --save ask-sdk

8. Alexa Skills Kit (ASK) CLI — Install it following the official AWS guide. Useful command line interface tool to easily manage and debug your Alexa Skill‍

npm install -g ask-cli‍

9. VS Code — Install it from here

10. VS Code ASK extension — Extension to use ASK SDK in VS code. Install instructions here.‍

11. Apiaudio Alexa Streaming Skill Github Repo — Download from here or just:‍

git clone git@github.com:aflorithmic/alexa.git‍

12. Amazon Alexa Phone app

Download the official Amazon Alexa app on your mobile phone, as this will be very useful for testing. Once it has downloaded, sign in using the Alexa developer account you created in step 2. Also change the language of the Alexa App to English (US).‍

13. (Optional) Amazon Alexa Echo Device

If you have one of these — great! Just make sure in the Alexa phone app (see 12), that the app is linked with your new user. Go to More/Settings/Device Settings and click on the + symbol /Add Device/Amazon Echo and select your Echo device. That’s it!‍

Tools ready? Let’s start.‍

Creating Audio content with api.audio

‍First, let’s create some Audio content using api.audio. In this case, we will generate a personalized newscast for each user. Have a listen to Sam’s personal newscast:

https://file.api.audio/mynews__username_sam.mp3 (This will trigger an mp3 download. No worries, it isn’t malware)‍‍

Sounds good doesn’t it? Everything was automatically generated in seconds by api.audio.‍

1. Go to console.api.audio and copy your api-key.‍

2. Create a new example.js file and copy the following code. You can also copy the code directly from github. Make sure you paste your api-key in the second line of code. Of course, feel free to modify the text!‍

const apiaudio = require("apiaudio").default
apiaudio.configure({ apiKey: "your key here"});
text = `
<<soundSegment::intro>>
<<sectionName::intro>>
Hey {{username}}, London underground to get full mobile network by end of twenty twentyfour.
The first stations to be connected, will include Bank, and Oxford Circus. <break time="2s"/>
<<soundSegment::main>>
<<sectionName::main>>
London mayor, who was re elected last month said. I promised Londoners that if they re elected
me for a second term as a mayor, I would deliver four gee throughout the tube network.
Transport for London explained that work on some of the capitals busiest station, including
Oxford Circus and Tottenham Court Road would begin soon, with plans for them to be among the
first fully connected stations by the end of next year.
<<soundSegment::outro>>
<<sectionName::outro>>
<break time="1s"/> This news was delivered to you by Always Up To Date.
Check out our webpage for the latest news.
<<soundEffect::effect1>>
`
template = "headlines"
audience = {"username": "Sam"}
async function create_audio(text, template, audience){
try {
const script = await apiaudio.Script.create({ scriptText: text, scriptName: "mynews" })
console.log("scriptId: ", script.scriptId)
const speechRequest = await apiaudio.Speech.create({ scriptId: script.scriptId, voice: "en-GB-RyanNeural", audience:[audience]});
const masteringRequest = await apiaudio.Mastering.create({scriptId: script.scriptId, public: true, soundTemplate: template, audience:[audience]});
const masteringResult = await apiaudio.Mastering.retrieve(script.scriptId, audience, _public=true);
console.log(masteringResult.url)
return masteringResult.url
} catch (e) {
console.error(e);
}
};
create_audio(text, template, audience)‍

3. Run the script:‍

node example.js
# scriptId: 293y2d6e-d79a-49aa-b901-4232ca67ac7d
# https://ms-file-mastering-public-prod.s3.amazonaws.com/aflr-a8e36432/default/default/mynews/mynews__username~sam.mp3

4. DONE! You already have your news produced! Just copy the url in the browser, and have a listen 😉 You should get an .mp3 file back!‍

5. Copy the scriptId generated, you’ll need it for the next part of the tutorial.‍

Do you want to go a step further and grab the news copy from a News API to produce it on the fly based on user preferences? Stay tuned, we will do this in the next tutorial 😉‍

Creating your Alexa Skill with ASK SDK for javascript‍

So we are done with the api.audio part. We already have our newscast, and now we want to create an Alexa Skill that can reproduce personalized news for each user. We want something like this. In this video we are producing the speech and mastering audio on the fly. Quite impressive — right? 😎

‍‍Let’s create our Alexa skill real quick:‍

1. Clone the github repo. We have prepared a simple Alexa Skill for you. This is already using api.audio under the hood. If you did this previously step you can continue.‍

git clone git@github.com:aflorithmic/alexa.git‍

2. Install the dependencies first (run this command in the /alexa folder):‍

(cd lambda/ && npm install)

3. Init ASK to configure our alexa skill‍

# say yes when asked to override, then say yes to everything
ask init

‍‍Now our Alexa Skill is configured, and all the code is already built for you, let’s review the folder structure and the important parts of the code. For an in-detail explanation I recommend the official documentation here:‍

  • lambda folder — Here is where the Alexa Skill logic is found. It’s called lambda because of lambda functions, which is the serverless service that Alexa uses under the hood to run the code. When you deploy, the code inside this folder will be sent to a lambda function, so it contains all the node modules, the package.json and package-lock.json files, as well as the javascript logic of the Alexa Skill (node.js). The function that connects apiaudio with the Alexa skill can be found in the util.js file.‍
  • skill-package folder — Here you find the skill.json file (also known as Skill Manifest) which has the general configuration of the skill, such as the locales (a.k.a. languages) supported (in our case is only en-US but feel free to add more), the invocation name per language, as well as the category of the Alexa Skill. Also in this folder you’ll find the InteractionModels folders which contains the configuration of the interaction model for each language. This is a vital part of the skill. In the interactionModels/custom/en-US.json file you can set the invocationName of the skill (in our case “api audio maker”). So, for example, to open your skill in your alexa device you’ll say: “alexa, open api audio maker”. Feel free to play with this. Then you have the intents, which allow you to set different types of “voice actions/queries” for your skill. In this case, we only have 1 intent set up: PlaySoundIntent that will allow the Alexa Skill user to trigger a Lambda function based on the user query. You can also grab information from the response of the user. In this example, we are grabbing the name of the user.‍

In the lambda/index.js file, you’ll find the Alexa Skill logic. The important bits are:‍

  • LaunchRequestHandler (line 4) — Here you have the logic that handles the launch request from Alexa. This is the function triggered when you invoke the lambda by saying: “Alexa, open api audio maker”.‍
  • PlaySoundIntentHandler (line 84) — Here you have the logic that handles our PlaySoundIntent created in the interactionModels/custom/en-US.json file.‍

In the lambda/util.js file you’ll find the code used from the index.js handlers to connect with api.audio and create/retrieve personalized audio. More on that in the following section.‍

Connecting your api.audio content with your Alexa Skill‍

Before deploying, let’s copy the scriptId and the api-key in our util.js code, otherwise it won’t work.‍

1. Go to lambda/util.js (lines 5–6) and paste the apiaudio api-key and the scriptId you created in the previous section.‍

# lambda/util.js lines 5-6
apiaudio.configure({ apiKey: "your apikey here"});
const SCRIPT_ID = "your scriptid here"‍

Please note this is an example and we try to simplify things. For a production version, we HIGHLY recommend that you use environment variables and never hardcode api keys in your code and/or github repositories.

That’s it. The code in util.js should be self-explanatory. Please check docs.api.audio, Speech resource and Mastering resource.‍

Deploy alexa skill‍

The code is ready, we are only missing one thing, deploying the code from our computer to AWS servers. Let’s do it:‍

1. Deploy your new Alexa Skill. Just do:‍

ask deploy‍

Testing your Alexa Skill‍

The Skill has been deployed, now it’s time to test it.‍

You have several options for testing:

1. Alexa Developer Console — This is the easiest. Just go to the link provided and click on your Alexa Skill. Then go to the Test tab and write “open api audio maker”. The only problem with the Developer Console is that the audioPlayer directives (used to play the audio coming from api.audio) doesn’t work and thus you cannot listen the audio coming from the api. But this is a great solution for testing that your skill works.‍

1. Using ASK CLI in your terminal. Run the following command to run the app in local:‍

ask run

2. Now open a new terminal window, and run the following command:‍

ask dialog

In the dialog tab you can type “open api audio maker” and your skill should respond. Again, audioPlayer directives are not supported in the terminal.‍

3. Using VSCode’s Alexa Skill plugin. Similar to the 1st option, but can be used in your local machine, inside VSCode. Also does not allow for AudioPlayer directives.‍

4. Using Alexa Phone App. This is the first testing mechanism that allows us to play the audio and have an end-to-end testing experience. Just open the app and click on the Alexa Icon so the app starts to record, and say: “open api audio maker”, and after the welcome message, just say: “open <yourname>”. This will automatically render your name and produce personalized news. If it does not work, please check the language settings, and remember it has to be exactly the same as the skill locales (in this case, must be English US).‍

5. Using an Echo device. If your echo device is linked to the Alexa App account and using English US as main language, you will be able to run your skill directly by talking to your Echo device “Alexa, open api audio maker”.‍

Final result

‍‍Next steps‍

Are you happy with the results? Would you like to create your own skill using api.audio and certify your skill, so it can be put onto the official Alexa Skill Store?‍

Please let us know and we will provide you with guidance and a more in-depth tutorial!

‍‍

Do you want to know more?

About Aflorithmic:

Aflorithmic is a London/Barcelona-based technology company. Its api.audio platform enables fully automated, scalable audio production by using synthetic media, voice cloning, and audio mastering, to then deliver it on any device, such as websites, mobile apps, or smart speakers.

With this Audio-As-A-Service, anybody can create beautiful sounding audio, starting from a simple text to including music and complex audio engineering without any previous experience required.

The team consists of highly skilled specialists in machine learning, software development, voice synthesizing, AI research, audio engineering, and product development.

About VozLab:

VozLab is a technology company based in London, building the next generation of voice-first applications. The companies’ self-serviced platform allows voice experiences to be created, edited and modified on live campaigns, without any coding experience required.

The platform enables these voice experiences to evolve with a campaign, which facilitates the conversation between brands and their customers. ‘We understand the best way to get revenue and results through smart speakers,’ concludes Maria Noel Reyes, CEO of VozLab ‘Through our self-serviced technologies, we have developed voice apps focused on longevity and flexibility that will challenge the traditional online mechanics and tactics.’

--

--

Matt Lehmann

COO@www.Aflorithmic.ai building www.api.audio a simple developer tools for adding audio to your applications