What’s the difference between how people use chatbots and search bars vs voice user interfaces and what does that mean for how you design interactions for each?
One of the big differences between designing for a voice user interface versus a chat user interface and one of the big kind of striking differences between how people use chat and text based interfaces including search boxes compared to voice is all to do with search refinements.
If your search on a retailer website, if you use natural language search on a retailer website and you search for something like “I’m looking for men’s summertime clothes” or “I’m looking for something to wear this summer.” “I’m looking for something to wear on my holiday” or any kind of natural language search like that.
If you don’t find anything off the back of doing that search then your search refinement will end up shortening your search phrase and you’ll make it more keyword-based: “men’s summer clothes”. You will refine it down to something shorter because we’ve been trained over decades about how to use search engines and how search engines work.
If I have an actual conversation, if I’m in a shop talking to a sales assistant and I say “I’m looking for some clothes” and they say “what do you mean?”, what I’m likely to do in that situation is refine my search, refine my phraseology.
But if I’m in person having a conversation, it’s likely to be a hell of a lot longer. And so instead of me just saying “men’s summer clothes”. I’m likely to say something like: “Well I’m going on holiday in a couple of weeks time, you know, it’s supposed to be really hot weather. I’m looking for some shorts and t-shirts that kind of stuff.”
So the utterance there is incredibly long because I’m adding a whole load more context to the discussion. I’m saying that we’re going on holiday. There’s some context. I’m saying it’s going to be hot weather. That’s inferred that I’m looking for hot summertime clothing. I give examples by saying shorts and t-shirts and I don’t need to say ‘mens’ because it’s implied by the subtext of the conversation given the person who’s actually having the conversation.
And so not only is there are additional information underneath the utterance but there’s also a hell of a lot more information in the utterance.
We’ve been trained over the years, lifetimes, of having conversations that if someone doesn’t understand you, you then elaborate so that you can add more context, more information, to help them understand.
In the voice context, if you’re using a shopping application or a shopping voice user interface and it asks you a question like “Do you want to know more about the red t-shirts or the blue t-shirts?”
With voice, you might say “Both”. Right, the utterance starts out being narrow and short, but if the system doesn’t understand you and it says, “I’m sorry. I didn’t understand that. Do you want red or blue?” You over-elaborate again because you’ve been trained in conversation to add more information so that the other person can understand you.
And so instead of saying “both” again, you’ll say “I need both the red and the blue”, “I want to know more about both the red and the blue” and your utterance becomes longer.
And so that’s one of the real things to pay attention to when you’re designing voice user interfaces is:
1) be clear about the way that you phrase the question and anticipate those kind of nuanced responses
2) be prepared, when you do have to repair a conversation, that sometimes the utterances that you’ll get in response might be a little bit longer and contain a little bit more information.
Of course, it does work the other way around. Sometimes people will start with a long search phrase, then realise the system’s not quite functioning properly. It doesn’t understand them. And therefore they’ll refine something to be a little bit shorter, but it’s not always the case and sometimes it is the inverse.