Yesterday I posted about how speech recognition systems, voice assistants, have trouble with different accents like Irish accents, Welsh accents, Scottish accents, Northern accents and how hard it is to actually train those systems based on accents because a different accent is essentially like a different language. read more
Dive deep into one of the core technologies that underpins the voice assistants you know and love, with one of the world’s leading speech recognition experts, Catherine Breslin.
Presented by Sparks
Sparks is a new podcast player app that lets you learn and retain knowledge while you listen.
The Sparks team are looking for people just like you: podcast listeners who’re also innovative early adopters of new tech, to try the beta app and provide feedback.
Try it now at sparksapp.io/vux
What is speech recognition and how does it work?
Automatics speech recognition (also known as ASR) is a suite of technology that takes audio signals containing speech, analysis it and converts it into text so that it can be read and understood by humans and machines. It’s the technology that makes voice assistants like Amazon Alexa able to understand what a user says.
There’s obviously a whole lot more to it that than, though. So, in today’s episode, we’re speaking to one of the most knowledgable and experienced speech recognition minds the world has to offer, Catherine Breslin, about just exactly what’s going on under the hood of automatic speech recognition technology and how it actually works.
Catherine Breslin studied speech recognition at Cambridge, before working on speech recognition systems at Toshiba and eventually on the Amazon Alexa speech recognition team where she met the Godfather of Alexa, Jeff Adams. Catherine then joined Jeff at Cobalt Speech where she currently creates bespoke speech recognition systems and voice assistants for organisations.
In this episode, you’ll learn how one of the fundamental voice technologies works, from beginning to end. This will give you a rounded understanding of automatic speech recognition technology so that, when you’re working on voice applications and conversational interfaces, you’ll at least know how it’s working and then be able to vet speech recognition systems appropriately.
I’m in Amsterdam today with JP and Jen who are over there looking rather nonchalant not joining me on the escalator, but in Amsterdam in Holland Google, Sorry, Alexa doesn’t exist.
The echo doesn’t exist.
It’s Google Assistant predominantly and I got me thinking about what would you do if those big commercial Voice Assistant platforms just didn’t exist?
Would you do without Google Assistant Siri Alexa,
are there other voice Technologies other platforms that you would use instead and for what use cases?
So JP had mentioned SoundHound for use in the car, you know looking at things like restaurants and navigation.
I think that this one is working nice.
I was talking about Snips before it was acquired being used in home electronic devices like coffee machines
and things like that.
So what other examples Do you have of other Technologies outside of the big commercial Technologies and what are the use cases?
What are they being used for? Let us know?
Voice assistants should change how they interact with users based on how you interact with them and the context you’re in.
🎧 If I’m walking with my headphones on, and I ask Siri “What was the Boro score at the weekend?” Then I just want the score. Nothing more. Did they win or lose?
🔊 If I’m at home and I ask the same question to my smart speaker then I have a little more time. You can give me some stats, tell me who scored etc.
📺 If I’m in the living room and I ask the same question to Alexa on my TV, then I might want to see some highlights, watch some punditry etc.
This is VUX design 2.0.
So does anyone have any examples of this happening across the main voice assistants or have you included interactions like this in your experiences?
So, here we are. The end of 2019. And it’s time for some click bait regarding 2020 voice-first industry predictions.
I told you to prepare for this a few weeks back.
While forecasting can be inspiring and encouraging, it could also be perceived as a holding message.
It’d be easy to hear about how advertising will come to smart speakers in 2020 and think “I’ll wait for that, then I’ll do something”.
Or read about 5G rolling out in the next 12 months and think “I’ll hang on until then”.
Or, maybe you’ll read how the voice mega-trend will cause a back-to-basics review of existing content, worry about how much work that’ll be and kick the can down the street.
Perhaps you’ll hear buzzwords like the importance of ‘compatibility and integration‘ and how ‘environment and context data‘ is key to enabling ‘transaction-oriented consumer intents’ in 2020, glaze over and think “this is just too complex”.
I just think they can often contain hopes and wishes or general, broad trends, rather than sober predictions of what might actually happen in the next 12 months.
And much of this stuff was predicted last year as well. Siri to open up third party development, anyone?
“There isn’t much in those predictions that wouldn’t have been suggested this time last year.”
David Low, Executive Product Manager, Voice and AI, BBC, incoming CEO, The List
While it’s nice to look forward, hope, predict or wish; critical even. It’s also good to understand what you should realistically do in 2020 if we continue to have another iterative year.
2020: another iterative year for voice
Perhaps a not so bold prediction (or a bolder prediction), which I subscribe to, is that 2020 will be an iterative year for voice assistants.
That might not be what you want to hear. But think about it. Think about some of the main challenges, like discoverability.
If it was easy to solve, it would have been solved already.
The big players could do something radical to address it, but what are the chances of that? It’s too risky.
In reality, they’ll iterate towards solving these problems over time.
That’s because to address something like discoverability, you need to address the current set-up of the platforms, challenge our app-centric mental modal and really consider whether skills are the right solution.
Amazon have far too much invested in skills and too much smart speaker penetration to risk confusing the message or pivoting in a big way. At least not over the next 12 months.
The only platform I could ever see pivoting from the app-centric mental model is Google which, to be honest, already has. Actions aren’t just ‘apps’, anything Google Assistant can do is an action, including performing a web search. That’s how Google can claim to have over a million actions.
Further iteration isn’t really a bad thing
Maybe that’s not such a great prediction for a rapidly evolving space. But so what?
I don’t know why everyone’s obsessed with ‘rapidly evolving’ things anyway. The pace of technological advancement is obviously quick (and quickening), but user behaviour doesn’t change at anywhere near the same pace.
People will change their habits from screens to speaking, for the right kind of things, over time. People won’t break out of 15 years worth of mobile conditioning or 20+ years of screen-based, keyboard conditioning in the next 12 months.
The reality is, the platforms are already capable of more than people are using them for. It’s the advancement of user behaviour that we should concentrate on.
What the voice industry should do in the next 12 months
What should happen in the next 12 months is that all of us in the industry should be doing the best work we possibly can, within the areas we can affect and influence, to give users the best possible experiences that increase their confidence and trust in voice assistants and voice interfaces.
Whether that’s putting voice search into your app, building an Alexa skill, putting a voice bot on your website, wherever it makes sense for your users. The main thing should be to provide quality, reliable experiences that do the job they need to, well, and consistently, and give users confidence in the medium.
Confidence that’ll build over time.
Confidence that’ll turn into repeat usage, over time.
Confidence that will lead to bigger behavioural changes and unlock this door we’ve been banging on for the last few years.
And for brands, just start. Move the needle. Get off the starting blocks. Just. Move.
With smartspeaker penetration being over 20% in the UK and usage rising, reaching your target market with something they value is a real proposition. And that won’t happen on its own.
Maybe no one in your industry has done it yet. Maybe there aren’t any case studies for you to compare.
But, you don’t need to start big, you just need to start. Don’t get caught short like you did with mobile and social.
We already have the tools. We just need to actually use them
Merry Christmas and all the best for a VUXing epic New Year.
Voice assistants and voice user interfaces (VUIs) are used interchangeably under the banner of ‘voice’. But what’s the difference between the two?
What is a voice assistant?
Google defines voice assistance as:
“Engaging with intelligent technology using voice as the method of input (e.g. a digital assistant like Google Assistant.”
WhatIs/Tech Target defines a voice assistant as:
“a digital assistant that uses voice recognition, natural language processing and speech synthesis to provide aid to users through phones and voice recognition applications”
The key word there is ‘aid’. A voice assistant aids and assists users in whatever it is they need assistance with, using speech as the primary method of receiving requests and responding.
That could be assisting you in accessing your calendar schedule, assisting you in finding a podcast, helping you get the news and even navigating through the apps on your phone, as we witnessed with the Google Assistant announcements at Google I/O ’19.
A key component of a voice assistant is the ability for it to access lots of different pieces of information from lots of different sources and funnel it all through one place.
For example, with Alexa, you can access you calendar, send SMS messages, book train tickets, order pizza and check your bank balance. Lots of information from lots of sources, all pulled through one place and accessed via voice.
Google Assistant, Alexa, Siri, Houndify, Mycroft, these are all examples of voice assistants.
What is a voice user interface?
Amazon defines a voice user interface (VUI) as something that:
“… allows people to use voice input to control computers and devices.”
A VUI, then, is an access point, in the same way a screen is an access point to a graphical user interface (GUI).
A voice interface is a front-end. A way of interacting with software with your voice, instead of tapping or typing.
For example, Spotify has just launched voice enabled ads in the US:
At the end of the ad, listeners can say ‘play now’ to play the advertised playlist. Using your voice to interact with the software instead of tapping. It’s not an ‘assistant’, it’s an ‘interface’.
Voice assistants have a voice interface, but voice interfaces aren’t always an interface to a voice assistant.
Hopefully that’ll help clear things up next time you read or talk about voice.
This week, Dustin and I are joined by journalist and author, James Vlahos, to discuss the details of his book Talk to Me: How voice computing will transform the way we live, work and think. read more
We’re talking to ex-Googlers, Konstantin Samoylov and Adam Banks, about their findings from conducting research on voice assistants at Google and their approach to building world-leading UX labs.
This episode is a whirlwind of insights, practical advice and engaging anecdotes that cover the width and breadth of user research and user behaviour in the voice first and voice assistant space. It’s littered with examples of user behaviour found when researching voice at Google and peppered with guidance on how to create world-class user research spaces.
Some of the things we discuss include:
- Findings from countless voice assistant studies at Google
- Real user behaviour in the on-boarding process
- User trust of voice assistants
- What people expect from voice assistants
- User mental models when using voice assistants
- The difference between replicating your app and designing for voice
- The difference between a voice assistant and a voice interface
- The difference between user expectations and reality
- How voice assistant responses can shape people’s expectations of the full functionality of the thing
- What makes a good UX lab
- How to design a user research space
- How voice will disrupt and challenge organisational structure
- Is there a place for advertising on voice assistants?
- Mistakes people make when seeking a voice presence (Hint: starting with ‘let’s create an Alexa Skill’ rather than ‘how will
- people interact with our brand via voice?’)
- The importance (or lack of) of speed in voice user interfaces?
- How to fit voice user research into a design sprint
Plus, for those of you watching on YouTube, we have a tour of the UX Lab in a Box!
Konstantin Samoylov and Adam Banks are world-leading user researchers and research lab creators, and founders of user research consultancy firm, UX Study.
The duo left Google in 2016 after pioneering studies in virtual assistants and voice, as well as designing and creating over 50 user research labs across the globe, and managing the entirety of Google’s global user research spaces.
While working as researchers and lab builders at Google, and showing companies their research spaces, plenty of companies used to ask Konstantin and Adam whether they can recommend a company to build them a similar lab. Upon realising that company doesn’t exist, they set about creating it!
UX Study designs and builds research and design spaces for companies, provides research consultancy services and training, as well as hires and sells its signature product, UX Lab in a Box.
UX Lab in a Box
The Lab in a Box, http://ux-study.com/products/lab-in-a-box/ is an audio and video recording, mixing and broadcasting unit designed specifically to help user researchers conduct reliable, consistent and speedy studies.
It converts any space into a user research lab in minutes and helps researchers focus on the most important aspect of their role – research!
It was born after the duo, in true researcher style, conducted user research on user researchers and found that 30% of a researchers time is spent fiddling with cables, setting up studies, editing video and generally faffing around doing things that aren’t research!
Konstantin Samoylov is an award-winning user researcher. He has nearly 20 years’ experience in the field and has conducted over 1000 user research studies.
He was part of the team that pioneered voice at Google and was the first researcher to focus on voice dialogues and actions. By the time he left, just 2 years ago, most of the studies into user behaviour on voice assistants at Google were conducted by him.
It’s likely that Adam Banks has more experience in creating user research spaces than anyone else on the planet. He designed, built and managed all of Google’s user research labs globally including the newly-opened ‘Userplex’ in San Francisco.
He’s created over 50 research and design spaces across the globe for Google, and also has vast experience in conducting user research himself.