Featured

JavaScript development Conversational VUI Voice Dialogflow Google Home Alexa NodeJS AssistantJS

Pains and gains when developing a node.js Dialogflow application

Posted by Thomas Rutzer on July 25th, 2018.

Pains and gains when developing a node.js Dialogflow application

Voice and conversational interfaces are mentioned all over nowadays and this makes sense since it can greatly improve your digital service. That is why we got very excited here at diePartments when we got assigned to build one for the first time: An Action for Google Home devices.
So this article is about the pains and gains during the first development phase (spoiler: gains outdo pains) and some learnings. Mainly we will talk about this topics:

Development workflow and tools
Assistant.js (nodejs framework we decided to use)

Development workflow and tools

Let's start talking about the workflow. In this context, we will mainly talk about the workflow between Dialogflow and your local nodejs application. If you have not heard about Dialogflow yet, you will quickly come across this Google powered webapp. Briefly it is your central hub to manage apps for Google Assistant platforms and integrations. Meaning you can setup, configure and publish apps here. You can also develop them right inside Dialogflow, which you probably would not want to do. Make sure you get yourself familiar with the core concepts and the vocabulary of Dialogflow first, since you will need it here.
Let's get back to our workflow topic.
The main pain point, we want to turn into a gain, or at least ease, is the connection to your fulfilment.
In the configuration of your Dialogflow app, you need to add an url to your fulfilment, which serves as a webhook. In a nutshell, this url will be hooked up with filled entities everytime Dialogflow filters an intent from what your user says - and should point to your app. So your app must be publicly reachable, which your localhost mostly is not. And if you do not want to painfully wait for changes until your deployment succeeds and the newest version of your app is online on your stage server or similar, you need to find a way to speed up this process. Lucky you, here comes help.

We were pointed to a webservice called ngrok. It works as a proxy between a randomly generated subdomain and your localhost. It is easy to setup. Once this is done, just call the ngrok executable like this ngrok http 3000, where 3000 is the port your local dev server listens to. Ngrok will output a custom URL, which you can add as your fulfilment. Et voilà, Dialogflow is connected to your local setup.
There might be similar services around, but ngrok is doing a decent job and if you want to give it a try, there is a free plan available. Although our workflow improved now, there is still one little pain point left, and unfortunately, we have not found a cure for this one yet: everytime you restart ngrok, you'll receive a fresh URL and consequently you need to update your fulfillment url as well.

Some other gotchas we came across while working with Dialogflow:

If you work with an external fulfilment, make sure enable webhook call for this intent is checked. By default, its disabled.
When working with another language than English, and you have not even set English as a fallback, it might happen that you cannot test your Dialogflow app inside the actions on Google simulator. That is a strange bug we have noticed. Maybe you cannot even reproduce it but then you have at least heard of it.
If you use another language than default en, make sure this language is selected in your app. Because for some reason, after a new login into Dialogflow, it switches back to en and no intent matches anymore.

💡Tip: Whenever building node.js apps, nodemon comes in handy to watch your changes and restart your application.

To complete your workflow, the final step would be to test your application early on a real device (next to the built-in simulator). This will give you a better feeling about how people will use it, and where the pitfalls are.

By now, you should be ready to develop your fulfillment. To do so, we used a rather new node.js framework called assistant.js by German company Web Computing.

So in the next section we will start getting 'our hands dirty' and walk through the pains and gains using AssistantJS ...

Assistant.js

Since this project was our first VUI app, we did quite some research to learn about best practices and examples. Here we found AssistantJS. We had never heard of this framework before, and maybe you have not either. Let's change this. These are the key reasons, why we chose it over other frameworks:

platform independent approach
dependency management (with inversify.js)
state management
cross plattform session management
a versatile template engine to built voice (or other) responses
built-in i18n support
cli assistance
typescript-based
practical testing setup

What a full-blown feature list. We will go into details for the first 5 of them. Besides, we highly recommend to read the AssistantJS' wiki as well. To avoid a lot of duplication, we briefly introduce major features of AssistantJS but then try to focus on our experiences while using it.

If you decide to use AssistantJS, we recommend to use its cli-tools. Or at least its generator to output configurations for different platforms. We also used its new project generator. You will have a starter running in minutes and can focus on your feature set.

Quick note: As for now, this framework works with Dialogflow Api v1. This should not cause much trouble, since this api-version will not be deprecated in the near future (see faq ). But if you really need to work with api v2, you can subscribe to this Github issue, where updates on this matter will hopefully be shared.

Platform independent approach

When developing an app with AssistantJS, you can serve it on multiple platforms like Amazon Alexa or Google Home. You only need a configuration for each platform you want to support and AssistantJS handles the rest for you. It abstracts each platform well, and unifies them so you have general interfaces to work with. Although we focused to built an Action for Google Home, we could provide this for Amazon Alexa painlessly.
Yet what might cause some pain are different response types. AssistantJS provides a globally available ResponseFactory, with which you can create different responseTypes, be it a VoiceResponse or ChatResponse. So far so good, but if you would like to have different responseTypes for each platform, you would have to deal with platform-specific conditions in your codebase. Something we actually try to avoid.

In summary, however, it can be said that similarities of each platform are abstracted and unified quite well, which already helps to avoid duplication and also seems a good foundation for upcoming platforms in the field of conversational interfaces.

Dependency management

AssistantJS uses dependency management with help of inversify.js. We have already been familiar with InversifyJS and think it is one of most mature standalone DI frameworks out there (preferably for Typescript projects). So we were happy to find it being used here. The only thing missing while working with InversifyJS here, was the option to extend its internal object injectionNames. Which is a constant object in AssistantJS, holding all keys of services, registered to its DI-container. Pretty useful for code completion, but it is missing a way to be extendable with own service classes. Moreover, it is troublesome trying to stick to the frameworks convention when registering your services. Here is an example.

By convention, service registration is handled in descriptor.ts of each module, and looks like this:

export const descriptor: ComponentDescriptor = {
    name: 'your-module',
    bindings: {
        root: (bindService) => {
            bindService.bindGlobalService<MyServiceInterface>('my-service').to(MyService);
        }
    }
};

What AssistantJS now magically does, is, registering your service with a key like this: your-module:my-service. But since this involves some kind of magic, you can't store this key in a variable like this:

const myInjectionNames = {
  MyService: 'your-module:my-service'
}

Because then you cannot use this variable in your descriptor.ts, for AssistantJS would turn it into your-module:your-module:my-service. Alternatively upon trying to store it like this:

const myInjectionNames = {
  MyService: 'my-service'
}

You could use it in your descriptor.ts, but you unfortunately you cannot retrieve it in your injections. Because if you would use your service like this:

@inject(myInjectionNames.MyService) protected MyService: MyServiceInterface

it would not find anything, since the prefix your-module is missing 😑. A vicious circle. There are workarounds, but they are not convenient. It would be nice if AssistantJS could provide a way to register your own service to its injectionNames object.

State managment

As said in the intro of this section, the state concept was one of the main reasons why we decided to give AssistantJS a try. In its wiki 'state concept' is described like this:

"[...] you implement your states as classes and your intents as methods."

^{source: AssistantJS wiki}

You might also think of a state as a scope, and each intent as a controller, only available inside its scope.

In theory, all of it makes perfect sense. But in practice we encountered some situations, were these states might cause some extra work and requires an elaborated user-interaction concept. This does not devaluate the concept, just make sure you keep something like this in mind:

Let's say, we have the following states and intents in a fictive app, a tv guide:

- 📗mainState
  - 📃startFavoritesIntent
  - 📃startTvProgramIntent

- 📙tvProgramState
  - 📃broadcastIntent
  - 📃detailsIntent
  ...

- 📘favoriteChannelsState
  - 📃getFavoritesIntent
  - 📃addToFavoritesIntent
  ...

When the user is in the mainState and asks something like: "What's on television today at 8:15pm on [channel]", the startTvProgramIntent will be invoked and the stateMachine will make a transition to the tvProgramState and call broadcastIntent. This will respond with an expectable answer and everything is fine!
Then the user might say something like "Okay, give me details on the show" and detailsIntent as part of tvProgramState will match and respond – still fine. But if your user says please add [channel] to my favourites out of the blue, it would actually be handled by addToFavoritesIntent. Since your user is still in tvProgramState, this intent would not be invoked. Now, we have to go an extra mile both conceptually and programming-related.
Since VUIs, compared to GUIs, miss a main navigation where users can always come back to, you will need to help them out here. For this specific example, you could implement a genericInfoIntent as part of tvProgramState, which will be invoked in this case. To the point, give some thoughts to your users experience. She or he needs to repeat her request to answer the genericInfoIntent (which might be something like I did not understand you - did you mean...). And on top, you might even need to double some utterances, since there are intents in both states, mainState and tvProgramState, which might be invoked by the same utterances.

Another way is to always filter the entities of a request and make smart assumptions. For example, if entity favorite is set, make a transition to favoritesStates and handle it here. Something like this can be seen in assistant.js demo app. Here, in the GameState, some smart assumptions are made: "As long as the user gave me a number, he probably meant guessNumberIntent()."

Some people will say: "Why not handle everything in mainState?". Sure, you can do that for simple apps. But if your app has multiple features, it is better to go the extra mile.Yet you the developer will end up happier by splitting up programm code.

All in all, AssistantJS' state concept is excellent, to keep a more complex application clearly arranged. Just keep these examples above in mind, gains will outdo pains and everything is awesome.🎆

Cross platform session management

Another cool feature of AssistantJS is its session managment. As said in the intro, our project assignment was an Action for Google Home. But you will never know when the moment comes and your client wants to support another platform as well. And this is where AssistantJS shines, and part of this is its cross platform session management.
We know, Dialogflow offers contexts, with which you could achieve the same. But then, your logic would be trapped inside this platform, and you know your client...😉

When you setup AssistantJS properly, you already have a Redis database running. This is the place, where AssistantJS stores session related data. Just keep in mind, that you can only store specific datatypes in a Redis database.

Storing session data is really straighforward. Thanks to DI management, you inject a session factory to every class like this:

@inject(injectionNames.current.sessionFactory) private sessionFactory: () => servicesInterfaces.Session

And the resulting object of this factory provides you with a set method. So storing goes like this:

await this.sessionFactory().set('key', JSON.stringify(data))

please keep the async behaviour in mind, when working with session managment.

Retrieving data is just as simple:

await this.sessionFactory().get('key')

Cool😎
Another nice features is, that you can also store entities among requests in AssistantJS. Generally, all request entities are stored in so-called entityDirectory – a global Service and therefore everywhere injectable:

@inject(injectionNames.current.entityDictionary) private entityDictionary: unifierInterfaces.EntityDictionary,

This service provides methods to store and retrieve those entities:

await this.entityDictionary.storeToSession(this.sessionFactory());

and

await this.entityDictionary.readFromSession(this.sessionFactory());

This is really useful to build a good user-flow and handle follow-up intents without loosing context. And the best thing: You do not have to clean up your session storage. AssistantJS does the dirty work for you. Everytime a session gets killed, maybe because your intent prompts a this.responseFactory.createVoiceResponse().endSessionWith(), the storage will be emptied and you can start fresh.

Remember our fictive tv programm app, we talked about in states section? There is a feature, which offers our users to store favorites. We could use our Redis database for this as well. Like any other global service in AssistantJS, you can inject a Redis instance into your classes:

 @inject(injectionNames.redisInstance) private redisClient: RedisClient

It is based on npm redis, where you can find a good documentation about it.

A versatile template engine to built voice (or other) responses

The last feature we want to talk about here is its template engine. This actually goes hand in hand with i18n-support. You can store your response texts in a file called translations.json. One for each language you want to support. Each key of this JSON holds a String, which serves as a response. But to make your responses more diverse, this Strings do not have to be static. You can...

Fill them with data

A response stored in translations.json might look like this:

{
  "tvProgramState": {
    "broadcastIntent": "The following programme will be broadcast on {{ channel }} at {{ time }}: {{ title }}"
  }
}

See those {{ }}. These are placeholders for your variables. You can invoke this with AssistantJS' translation helper like this:

this.translateHelper(
    "tvProgramState.broadcastIntent",
    {
        time: "20:15"
        channel: "channel1",
      title: "My favorite movie"
    }
)

And the result would be The following programme will be broadcast on channel1 at 8:15pm: "My favorite movie. Nice💪! But AssistantJS would be less cool if you could not do

Translation variations

Hold your beer, this is one of the hottest features in AssistantJS. In our translations.js, we could add something like this:

{
  ...
  "favoriteChannelsState": {
    "getFavoritesIntent": "I { tell | list } your favorite channels"
  }
  ...
}

See this { *** | *** }? This is AssistantJS' way of getting a greater variety of responses. Because each time you call this with your translateHelper, it could either result in I tell your favorite channels or I list your favorite channels. But on top of this, you could also do this:

{
  ...
  "favoriteChannelsState": {
    "addToFavoritesIntent":[
      "I added {{ channel }} to your favorites list",
      "{{ channel }} is now on your favorites list"
    ]
  }
  ...
}

Can you see this string[] as value of addToFavoritesIntent. This might result either into I added channel1 to your favorites list or channel1 is now on your favorites list. Human behaviour 😶.

This can increase your user experience a lot, especially if you output repetitions, maybe inside a loop. In summary, this magic works very well and is based on the well-known i18Next.

Conclusion

All in all, we think AssistantJS is an excellent framework to help you building VUI applications.
It is aimed at experienced developers, who will surely enjoy it. Although it is currently only at a 0.4.x version, we think it is safe to use it in a professional way. Keep aware if you develop an Action for Google Home devices and need some of the features of Dialogflow API v2, you currently should not use it. But since AssistantJS is abstracted very well from any platform, its maintainers will adopt to this api version fast. We read somewhere, that this framework was part of the master thesis from one of webcomputings team member. If that is true, Mbappé might own the future of soccer, but this guy might own the future of conversational interfaces 🙏

⬇️ Do not miss the comments section below. You might find more really good tips down there.

Thomas Rutzer

...is a frontend software developer with strong focus on JavaScript based applications, large-scaling CSS and modern User Interfaces

View Comments...

Pains and gains when developing a node.js Dialogflow application

Introduction

Thomas Rutzer

Pains and gains when developing a node.js Dialogflow application

Development workflow and tools

Assistant.js

Platform independent approach

Dependency management

State managment

Cross platform session management

A versatile template engine to built voice (or other) responses

Fill them with data

Translation variations

Conclusion

Thomas Rutzer