JavaScript development Conversational VUI Voice Dialogflow Google Home Alexa NodeJS AssistantJS
Pains and gains when developing a node.js Dialogflow application
Posted by Thomas Rutzer on .Pains and gains when developing a node.js Dialogflow application
Voice and conversational interfaces are mentioned all over nowadays and this makes sense since it can
greatly improve your digital service. That is why we got very excited here at diePartments when we got assigned to build one for the first
time: An Action for Google Home devices.
So this article is about the pains and gains during the first development phase (spoiler: gains outdo
pains) and some learnings. Mainly we will talk about this topics:
- Development workflow and tools
- Assistant.js (nodejs framework we decided to use)
Development workflow and tools
Let's start talking about the workflow. In this context, we will mainly talk about the workflow between
Dialogflow and your local nodejs application. If you have not heard about Dialogflow yet, you will quickly come across this Google
powered webapp. Briefly it is your central hub to manage apps for Google Assistant platforms and
integrations. Meaning you can setup, configure and publish apps here. You can also develop them right
inside Dialogflow, which you probably would not want to do. Make sure you get yourself familiar with the
core concepts and the vocabulary of Dialogflow first, since you will need it here.
Let's get back to our workflow topic.
The main pain point, we want to turn into a gain, or at least ease, is the connection to your
fulfilment.
In the configuration of your Dialogflow app, you need to add an url to your fulfilment, which serves as
a webhook. In a nutshell, this url will be hooked up with filled entities everytime Dialogflow filters
an intent from what your user says - and should point to your app. So your app must be publicly
reachable, which your localhost mostly is not. And if you do not want to painfully wait for changes
until your deployment succeeds and the newest version of your app is online on your stage server or
similar, you need to find a way to speed up this process. Lucky you, here comes help.
We were pointed to a webservice called ngrok. It works as a
proxy between a randomly generated subdomain and your localhost. It is easy to setup. Once this is done,
just call the ngrok executable like this ngrok http 3000
, where 3000
is the
port your local dev server listens to. Ngrok will output a custom URL, which you can add as your
fulfilment. Et voilΓ , Dialogflow is connected to your local setup.
There might be similar services around, but ngrok is doing a decent job and if you want to give it a
try, there is a free plan available. Although our workflow improved now, there is still one little pain
point left, and unfortunately, we have not found a cure for this one yet: everytime you restart ngrok,
you'll receive a fresh URL and consequently you need to update your fulfillment url as well.
Some other gotchas we came across while working with Dialogflow:
- If you work with an external fulfilment, make sure enable webhook call for this intent is checked. By default, its disabled.
- When working with another language than English, and you have not even set English as a fallback, it might happen that you cannot test your Dialogflow app inside the actions on Google simulator. That is a strange bug we have noticed. Maybe you cannot even reproduce it but then you have at least heard of it.
- If you use another language than default en, make sure this language is selected in your app. Because for some reason, after a new login into Dialogflow, it switches back to en and no intent matches anymore.
π‘Tip: Whenever building node.js apps, nodemon comes in handy to watch your changes and restart your application.
To complete your workflow, the final step would be to test your application early on a real device (next to the built-in simulator). This will give you a better feeling about how people will use it, and where the pitfalls are.
By now, you should be ready to develop your fulfillment. To do so, we used a rather new node.js framework called assistant.js by German company Web Computing.
So in the next section we will start getting 'our hands dirty' and walk through the pains and gains using AssistantJS ...
Assistant.js
Since this project was our first VUI app, we did quite some research to learn about best practices and examples. Here we found AssistantJS. We had never heard of this framework before, and maybe you have not either. Let's change this. These are the key reasons, why we chose it over other frameworks:
- platform independent approach
- dependency management (with inversify.js)
- state management
- cross plattform session management
- a versatile template engine to built voice (or other) responses
- built-in i18n support
- cli assistance
- typescript-based
- practical testing setup
What a full-blown feature list. We will go into details for the first 5 of them. Besides, we highly recommend to read the AssistantJS' wiki as well. To avoid a lot of duplication, we briefly introduce major features of AssistantJS but then try to focus on our experiences while using it.
If you decide to use AssistantJS, we recommend to use its cli
-tools. Or at least its
generator to output configurations for different platforms. We also used its new project
generator. You will have a starter running in minutes and can focus on your feature set.
Quick note: As for now, this framework works with Dialogflow Api v1. This should not cause much trouble, since this api-version will not be deprecated in the near future (see faq ). But if you really need to work with api v2, you can subscribe to this Github issue, where updates on this matter will hopefully be shared.
Platform independent approach
When developing an app with AssistantJS, you can serve it on multiple platforms like Amazon Alexa or
Google Home. You only need a configuration for each platform you want to support and AssistantJS handles
the rest for you. It abstracts each platform well, and unifies them so you have general interfaces to
work with. Although we focused to built an Action for Google Home, we could provide this for Amazon
Alexa painlessly.
Yet what might cause some pain are different response types. AssistantJS provides a globally available
ResponseFactory
, with which you can create different responseTypes
, be it a
VoiceResponse
or ChatResponse
. So far so good, but if you would like to have
different responseTypes
for each platform, you would have to deal with platform-specific
conditions in your codebase. Something we actually try to avoid.
In summary, however, it can be said that similarities of each platform are abstracted and unified quite well, which already helps to avoid duplication and also seems a good foundation for upcoming platforms in the field of conversational interfaces.
Dependency management
AssistantJS uses dependency management with help of inversify.js.
We have already been familiar with InversifyJS and think it is one of most mature standalone DI
frameworks out there (preferably for Typescript projects). So we were happy to find it being used here.
The only thing missing while working with InversifyJS here, was the option to extend its internal object
injectionNames
. Which is a constant object in AssistantJS, holding all keys
of
services, registered to its DI-container. Pretty useful for code completion, but it is missing a way to
be extendable with own service classes. Moreover, it is troublesome trying to stick to the frameworks
convention when registering your services. Here is an example.
By convention, service registration is handled in descriptor.ts
of each module, and looks
like this:
export const descriptor: ComponentDescriptor = {
name: 'your-module',
bindings: {
root: (bindService) => {
bindService.bindGlobalService<MyServiceInterface>('my-service').to(MyService);
}
}
};
What AssistantJS now magically does, is, registering your service with a key like this: your-module:my-service
.
But since this involves some kind of magic, you can't store this key in a variable like this:
const myInjectionNames = {
MyService: 'your-module:my-service'
}
Because then you cannot use this variable in your descriptor.ts
, for AssistantJS would turn
it into your-module:your-module:my-service
. Alternatively upon trying to store it like
this:
const myInjectionNames = {
MyService: 'my-service'
}
You could use it in your descriptor.ts
, but you unfortunately you cannot retrieve it in your
injections. Because if you would use your service like this:
@inject(myInjectionNames.MyService) protected MyService: MyServiceInterface
it would not find anything, since the prefix your-module
is missing π. A vicious circle.
There are workarounds, but they are not convenient. It would be nice if AssistantJS could provide a way
to register your own service to its injectionNames
object.
State managment
As said in the intro of this section, the state concept was one of the main reasons why we decided to give AssistantJS a try. In its wiki 'state concept' is described like this:
"[...] you implement your states as classes and your intents as methods."
source: AssistantJS wiki
You might also think of a state as a scope, and each intent as a controller, only available inside its scope.
In theory, all of it makes perfect sense. But in practice we encountered some situations, were these states might cause some extra work and requires an elaborated user-interaction concept. This does not devaluate the concept, just make sure you keep something like this in mind:
Let's say, we have the following states and intents in a fictive app, a tv guide:
- πmainState
- πstartFavoritesIntent
- πstartTvProgramIntent
- πtvProgramState
- πbroadcastIntent
- πdetailsIntent
...
- πfavoriteChannelsState
- πgetFavoritesIntent
- πaddToFavoritesIntent
...
When the user is in the mainState
and asks something like: "What's on television today at
8:15pm on [channel]", the startTvProgramIntent
will be invoked and the stateMachine will
make a transition to the tvProgramState
and call broadcastIntent
. This will
respond with an expectable answer and everything is fine!
Then the user might say something like "Okay, give me details on the show" and
detailsIntent
as part of tvProgramState
will match and respond β still fine.
But if your user says please add [channel] to my favourites
out of the blue, it would
actually be handled by addToFavoritesIntent
. Since your user is still in tvProgramState
,
this intent would not be invoked. Now, we have to go an extra mile both conceptually and
programming-related.
Since VUIs, compared to GUIs, miss a main navigation where users can always come back to, you will need
to help them out here. For this specific example, you could implement a genericInfoIntent
as part of tvProgramState
, which will be invoked in this case. To the point, give some
thoughts to your users experience. She or he needs to repeat her request to answer the genericInfoIntent
(which might be something like I did not understand you - did you mean...
). And on top, you
might even need to double some utterances
, since there are intents in both states, mainState
and tvProgramState
, which might be invoked by the same utterances
.
Another way is to always filter the entities of a request and make smart assumptions. For example, if
entity favorite
is set, make a transition to favoritesStates
and handle it
here. Something like this can be seen in assistant.js
demo app. Here, in the GameState
, some smart assumptions are made: "As long as the
user gave me a number, he probably meant guessNumberIntent()."
Some people will say: "Why not handle everything in mainState?". Sure, you can do that for simple apps. But if your app has multiple features, it is better to go the extra mile.Yet you the developer will end up happier by splitting up programm code.
All in all, AssistantJS' state concept is excellent, to keep a more complex application clearly arranged. Just keep these examples above in mind, gains will outdo pains and everything is awesome.π
Cross platform session management
Another cool feature of AssistantJS is its session managment. As said in the intro, our project
assignment was an Action for Google Home. But you will never know when the moment comes and your client
wants to support another platform as well. And this is where AssistantJS shines, and part of this is its
cross platform session management.
We know, Dialogflow offers contexts, with which you could achieve the same. But then, your logic would
be trapped inside this platform, and you know your client...π
When you setup AssistantJS properly, you already have a Redis database running. This is the place, where AssistantJS stores session related data. Just keep in mind, that you can only store specific datatypes in a Redis database.
Storing session data is really straighforward. Thanks to DI management, you inject a session factory to every class like this:
@inject(injectionNames.current.sessionFactory) private sessionFactory: () => servicesInterfaces.Session
And the resulting object of this factory provides you with a set
method. So storing goes
like this:
await this.sessionFactory().set('key', JSON.stringify(data))
please keep the async
behaviour in mind, when working with session managment.
Retrieving data is just as simple:
await this.sessionFactory().get('key')
Coolπ
Another nice features is, that you can also store entities among requests in AssistantJS. Generally, all
request entities are stored in so-called entityDirectory
β a global Service and therefore
everywhere injectable:
@inject(injectionNames.current.entityDictionary) private entityDictionary: unifierInterfaces.EntityDictionary,
This service provides methods to store and retrieve those entities:
await this.entityDictionary.storeToSession(this.sessionFactory());
and
await this.entityDictionary.readFromSession(this.sessionFactory());
This is really useful to build a good user-flow and handle follow-up intents without loosing context. And
the best thing: You do not have to clean up your session storage. AssistantJS does the dirty work for
you. Everytime a session gets killed, maybe because your intent prompts a this.responseFactory.createVoiceResponse().endSessionWith()
,
the storage will be emptied and you can start fresh.
Remember our fictive tv programm app, we talked about in states section? There is a feature, which offers our users to store favorites. We could use our Redis database for this as well. Like any other global service in AssistantJS, you can inject a Redis instance into your classes:
@inject(injectionNames.redisInstance) private redisClient: RedisClient
It is based on npm redis, where you can find a good documentation about it.
A versatile template engine to built voice (or other) responses
The last feature we want to talk about here is its template engine. This actually goes hand in hand with
i18n
-support. You can store your response texts in a file called
translations.json
. One for each language you want to support. Each key of this JSON holds a
String
, which serves as a response. But to make your responses more diverse, this Strings
do not have to be static. You can...
Fill them with data
A response stored in translations.json
might look like this:
{
"tvProgramState": {
"broadcastIntent": "The following programme will be broadcast on {{ channel }} at {{ time }}: {{ title }}"
}
}
See those {{ }}
. These are placeholders for your variables. You can invoke this with
AssistantJS' translation helper like this:
this.translateHelper(
"tvProgramState.broadcastIntent",
{
time: "20:15"
channel: "channel1",
title: "My favorite movie"
}
)
And the result would be The following programme will be broadcast on channel1 at 8:15pm: "My
favorite movie
. Niceπͺ! But AssistantJS would be less cool if you could not do
Translation variations
Hold your beer, this is one of the hottest features in AssistantJS. In our translations.js
,
we could add something like this:
{
...
"favoriteChannelsState": {
"getFavoritesIntent": "I { tell | list } your favorite channels"
}
...
}
See this { *** | *** }
? This is AssistantJS' way of getting a greater variety of responses.
Because each time you call this with your translateHelper, it could either result in I tell your
favorite channels
or I list your favorite channels
. But on top of this, you
could also do this:
{
...
"favoriteChannelsState": {
"addToFavoritesIntent":[
"I added {{ channel }} to your favorites list",
"{{ channel }} is now on your favorites list"
]
}
...
}
Can you see this string[]
as value of addToFavoritesIntent
. This might result
either into I added channel1 to your favorites list
or channel1 is now on your
favorites list
. Human behaviour πΆ.
This can increase your user experience a lot, especially if you output repetitions, maybe inside a loop. In summary, this magic works very well and is based on the well-known i18Next.
Conclusion
All in all, we think AssistantJS is an excellent framework to help you building VUI applications.
It is aimed at experienced developers, who will surely enjoy it. Although it is currently only at a
0.4.x version, we think it is safe to use it in a professional way. Keep aware if you develop an Action
for Google Home devices and need some of the features of Dialogflow API v2, you currently should not use
it. But since AssistantJS is abstracted very well from any platform, its maintainers will adopt to this
api version fast. We read somewhere, that this framework was part of the master thesis from one of
webcomputings team member. If that is true, MbappΓ© might own the future of soccer, but this guy might
own the future of conversational interfaces π
β¬οΈ Do not miss the comments section below. You might find more really good tips down there.