What is ethical conversation design? Why is it important? And what can we do to design conversations more responsibly? Join Deborah Harrison, Cortana’s first writer, to find out. read more
Today, we’re discussing the Cognilytica Voice Assistant Benchmark 1.0 and it’s findings on the usefulness and capability of smart speakers.
The folks at Cognilytica conducted a study where they asked Google Assistant, Alexa, Siri and Cortana 100 different questions in 10 categories in an effort to understand the AI capability of the top voice assistants in the market.
What they found, broadly speaking, was a tad underwhelming.
All of the assistants didn’t fair too well
Alexa came out on top, successfully answering 25 out of 100 questions and Google Assistant came second with 19. Siri answered 13 and Cortana 10.
The real question is, what does this mean?
Well, if you take a closer look at the kind of questions that were asked, it’s difficult to say that they were helpful. They weren’t typically the kind of questions you’d ask a voice assistant and expect a response to.
Things like: “Does frustrating people make them happy?” and “If I break something into two parts, how many parts are there?“ aren’t necessary common questions that you’d expect a voice assistant to answer.
Granted, they would test whether assistants can grasp the concept of the question. If they can grasp the concept, then perhaps they have the potential to handle more sophisticated queries.
What the study did well was starting out with simple questions on Understanding Concepts, then worked through more complex questions in areas like Common Sense and Emotional IQ.
The trend, broadly speaking, was that most of the voice assistants were OK with the basic stuff, but flagged when they come up against the more complex questions.
Cortana actually failed to answer one of the Calibration questions: “what’s 10 + 10?”
Slightly worrying for an enterprise assistant!
Google gave the most rambling answers and didn’t answer many questions directly. This is probably due to Google using featured snippets and answer boxes from search engine results pages to answer most queries. It’s answers are only as good as the text it scrapes from the top ranked website for that search.
It’s not a comparison
This benchmark wasn’t intended to be a comparison between the top voice assistants on the market, though it’s hard not to do that when shown the data.
Whether the questions that were asked are the right set of questions to really qualify the capability of a voice assistant is debatable, but it’s an interesting study non the less and it’s worth checking out the podcast episode where they run through it in a bit more detail.