Voice AI Lego: the importance of modularity

Voice AI Lego: the importance of modularity 1024 768 Ben McCulloch

Why a modular approach to voice AI technology selection gives you ultimate flexibility, according to Shawn Edmunds, CRO, Lumenvox.

Let’s say you’re planning to create your first voice AI solution in-house, with your own team.

Do you build every component yourself, from the ground-up? The ASR, the NLU, the dialogue management, the integrations, the TTS? Unless you’re Deutsche Telekom, and have a clear reason and resources to do it, then that’s the best way to burn time and money.

All of this stuff has already been built for you. You just need to stitch it together. Chances are, you’ll go looking for partners to provide the capabilities you can’t or won’t make yourself.

But how do you choose? And what happens when you reach the limits of your chosen technology or you develop new requirements that render it not fit for purpose anymore? Perhaps you select Dialogflow as your NLU. Then what if you outgrow Dialogflow in 3 years – how hard will it be to swap your NLU? What if Dialogflow doesn’t perform so well for certain use cases or on certain channels?

Shawn Edmunds, CRO of LumenVox, says: “You need to pick a partner that scales with you, have the ability to select modules that meet your needs, and be able to swap them when your needs change.”

And it’s not just for companies dipping their toes into voice for the first time…

Shawn says:

“Make it simple, make it cost effective, future proof it, and most important – it needs to scale”

So how does that work? The modular approach

It’s all about modules. You build your voice solution in-house using the best modules that fit your needs; perhaps you need an ASR that can be trained for your use case, or you need to support multiple languages, or distribute into another channel, or any other specific need. You just select the best modules for the job and stitch them together.

Even if you already have existing voice AI capabilities, perhaps you want to change your voice biometric tool. Shawn says it’s all about being malleable – you need to be able to swap components when you need to.

With a modular approach, you have full control over everything, but you don’t have to build everything. You can avoid vendor lock-in and change your architecture as your needs change. You still own and control the full experience.

And nobody may ever know your solution was built with third party components. To the end user, they don’t even care, it just works.

Where you will be if you marry the right voice partner

Having flexibility in your tech stack will be even more important in future, as technology continues to become intertwined in our lives. Shawn says that in the next two to five years:

“Everything’s tied, all your devices are connected, whether it’s your phone, or your car, or an Alexa”

To achieve this future, our tech stacks are going to need a lot of modules, and they’ll be interconnected in various ways.

The final word

You could say that going modular will solve many of your troubles.

You’ll be able to incorporate new technologies as they arise – and in this industry, there’s no telling what will appear tomorrow. New technologies can, and will, drastically alter tomorrow’s voice tech landscape. Companies need to build a tech stack that allows them to evolve with the times and changing needs/expectations of customers.

So you should probably find partners who can help enable that with you.

Listen to the full interview with Shawn Edmunds here or on Apple podcasts, Spotify, YouTube or wherever you get your podcasts.

This article was written by Benjamin McCulloch. Ben is a freelance conversation designer and an expert in audio production. He has a decade of experience crafting natural sounding dialogue: recording, editing and directing voice talent in the studio. Some of his work includes dialogue editing for Philips’ ‘Breathless Choir’ series of commercials, a Cannes Pharma Grand-Prix winner; leading teams in localizing voices for Fortune 100 clients like Microsoft, as well as sound design and music composition for video games and film.

    The world's most loved conversational AI event is back
    This is default text for notification bar
    Share via
    Copy link
    Powered by Social Snap