All you need to know about SSML and why it’ll only take you so far

All you need to know about SSML and why it’ll only take you so far 1800 1200 Kane Simms

SSML is great for tuning text-to-speech (TTS) systems, but it has some limitations. read more

Are you over-polishing your chatbot?

Are you over-polishing your chatbot? 904 678 Kane Simms

When it comes to designing chatbots and voicebots, don’t over-polish your dialogue. read more

Conversation design best practice with Salesforce’s Greg Bennett

Conversation design best practice with Salesforce’s Greg Bennett 1800 1200 Kane Simms

Learn conversation design best practice with Conversation Design Principle, Salesforce, Greg Bennett. read more

Amazon Alexa’s Paul Cutsinger tells us what’s new

Amazon Alexa’s Paul Cutsinger tells us what’s new 1800 1200 Kane Simms

At Alexa Live 2020, Amazon dropped 31 new features on us, the most features dropped at one time ever. Alexa’s Head of Developer Strategy, Paul Cutsinger, joins us to break them down. read more

Your bot’s not an expert, it’s a toddler, and that’s OK

Your bot’s not an expert, it’s a toddler, and that’s OK 1800 1200 Kane Simms

Standford University found that chatbots and voicebots that are positioned as toddlers fair better than those positioned as experts. read more

The difference between chatbot and voice search refinements

The difference between chatbot and voice search refinements 876 657 Kane Simms

What’s the difference between how people use chatbots and search bars vs voice user interfaces and what does that mean for how you design interactions for each?

One of the big differences between designing for a voice user interface versus a chat user interface and one of the big kind of striking differences between how people use chat and text based interfaces including search boxes compared to voice is all to do with search refinements.

If your search on a retailer website, if you use natural language search on a retailer website and you search for something like “I’m looking for men’s summertime clothes” or “I’m looking for something to wear this summer.” “I’m looking for something to wear on my holiday” or any kind of natural language search like that.

If you don’t find anything off the back of doing that search then your search refinement will end up shortening your search phrase and you’ll make it more keyword-based: “men’s summer clothes”. You will refine it down to something shorter because we’ve been trained over decades about how to use search engines and how search engines work.

If I have an actual conversation, if I’m in a shop talking to a sales assistant and I say “I’m looking for some clothes” and they say “what do you mean?”, what I’m likely to do in that situation is refine my search, refine my phraseology.

But if I’m in person having a conversation, it’s likely to be a hell of a lot longer. And so instead of me just saying “men’s summer clothes”. I’m likely to say something like: “Well I’m going on holiday in a couple of weeks time, you know, it’s supposed to be really hot weather. I’m looking for some shorts and t-shirts that kind of stuff.”

So the utterance there is incredibly long because I’m adding a whole load more context to the discussion. I’m saying that we’re going on holiday. There’s some context. I’m saying it’s going to be hot weather. That’s inferred that I’m looking for hot summertime clothing. I give examples by saying shorts and t-shirts and I don’t need to say ‘mens’ because it’s implied by the subtext of the conversation given the person who’s actually having the conversation.

And so not only is there are additional information underneath the utterance but there’s also a hell of a lot more information in the utterance.

We’ve been trained over the years, lifetimes, of having conversations that if someone doesn’t understand you, you then elaborate so that you can add more context, more information, to help them understand.

In the voice context, if you’re using a shopping application or a shopping voice user interface and it asks you a question like “Do you want to know more about the red t-shirts or the blue t-shirts?”

With voice, you might say “Both”. Right, the utterance starts out being narrow and short, but if the system doesn’t understand you and it says, “I’m sorry. I didn’t understand that. Do you want red or blue?” You over-elaborate again because you’ve been trained in conversation to add more information so that the other person can understand you.

And so instead of saying “both” again, you’ll say “I need both the red and the blue”, “I want to know more about both the red and the blue” and your utterance becomes longer.

And so that’s one of the real things to pay attention to when you’re designing voice user interfaces is:

1) be clear about the way that you phrase the question and anticipate those kind of nuanced responses
2) be prepared, when you do have to repair a conversation, that sometimes the utterances that you’ll get in response might be a little bit longer and contain a little bit more information.

Of course, it does work the other way around. Sometimes people will start with a long search phrase, then realise the system’s not quite functioning properly. It doesn’t understand them. And therefore they’ll refine something to be a little bit shorter, but it’s not always the case and sometimes it is the inverse.

Conversational ear worms

Conversational ear worms 1800 1200 Kane Simms

What is the conversational equivalent of an ear worm?

An ear worm is a song that you just cannot get out of your head. It doesn’t matter how hard you try it just sticks in there.

If any of you have got kids then you’ll know exactly what it’s like to wake up at five o’clock in the morning, busting for the loo and you just cannot get that Peppa Pig song out of your head!

Musicians and music writers all over the world strive to create ear worms because if you can create an ear worm, then that’s job done!

My latest ear worm, I don’t see any reason why you should be immune to this is, Thomas the Tank Engine.

So I was thinking about that and I was thinking what’s the conversational equivalent of an ear worm?

We’ve all had conversations that we remember, some of us have had conversations that might have even been life-changing.

Does the same logic tie into conversations that we have with our voice assistants?

I remember the first time I asked Google Assistant for a football score and it played the sound of crowd cheering in the background. I still remember that today. It’s one of the best interactions I’ve had on Google Assistant.

And so we have the tools to create memorable experiences through a combination of conversation design and sound design and it doesn’t matter whether you’re a boring old insurance company or whether you’re a cutting-edge media outfit.

We all have access to the same tools and we all have the potential to create memorable and meaningful conversations.

So what’s the most memorable conversation you’ve had with your voice assistant, or the most memorable conversation you’ve had at all, and why?

Think conversation design is complex? You aint seen nothing yet

Think conversation design is complex? You aint seen nothing yet 1800 1200 Kane Simms

If you think conversation design is complex, you ain’t seen nothing yet. read more

The future is multi modal

The future is multi modal 1144 762 Kane Simms


A good digital assistant will take context into consideration when providing a user experience.

Now that context can be related to the device that you’re using, could be related to the environment that you’re in, could be related to how much time and attention you have available at any given time.

So for example, if I’m in the kitchen washing up, I might have a bit of time but you might not have my attention and so the experience might need to be different to if I’m sitting in the front room watching the TV, where I do have time and I do have attention or if I’m out for a run wearing headphones and I don’t have either and so in the headphone example, maybe your interactions need to be really short and sharp and transient. In the living room example maybe you use visuals a little bit more and you lean on visuals more and in the kitchen, maybe you use audio first and you try and emphasize using earcons and things like that to make more of an audible experience.

Now, those are just real high-level examples and it’s difficult enough to create one conversation that’s intuitive. That’s natural. That’s easy to use.

Now think about doing that for all of these different devices and think about doing that not just for one third party app that you create but if you are the designers behind Google Assistant, it exists on over a billion devices, in over 90 countries and 30 different languages.

How do you create conversations that, yes, adapt to the different devices that you create as Google, but also the any number of devices that could be created by third-party manufacturers putting Google Assistant in their own hardware.

That is a very complex, very big task but it has to be the task for someone, and that someone is Daniel Padgett, Head of Conversation Design at Google.

He and his team work on creating consistent conversations across modalities for Google Assistant and we had the opportunity to interview Daniel and chat multi modal design for Google Assistant on the VUX World podcast this week.

We talked to Daniel about just how you go about creating genuine multimodal conversations that change depending on the device and context the user is in and where the future of multimodality is going from Google’s perspective.

Designing natural conversations with IBM’s Bob Moore

Designing natural conversations with IBM’s Bob Moore 1800 1200 Kane Simms

Kane and Dustin speak to author and researcher, Bob Moore, about his book Conversational UX Design and how to use conversation analysis techniques to create more natural conversational experiences. read more