voice assistants

Hacking speech systems

Hacking speech systems 863 647 VUX World

Yesterday I posted about how speech recognition systems, voice assistants, have trouble with different accents like Irish accents, Welsh accents, Scottish accents, Northern accents and how hard it is to actually train those systems based on accents because a different accent is essentially like a different language. read more

Why Alexa doesn’t understand your accent

Why Alexa doesn’t understand your accent 1576 1182 VUX World

A new study by Uswitch has found that voice assistants struggle with Irish accents… and Welsh, and Scottish. read more

What is automatic speech recognition and how does it work? With Catherine Breslin

What is automatic speech recognition and how does it work? With Catherine Breslin 1400 933 VUX World

Dive deep into one of the core technologies that underpins the voice assistants you know and love, with one of the world’s leading speech recognition experts, Catherine Breslin.


Presented by Sparks

Sparks is a new podcast player app that lets you learn and retain knowledge while you listen.

The Sparks team are looking for people just like you: podcast listeners who’re also innovative early adopters of new tech, to try the beta app and provide feedback.

Try it now at sparksapp.io/vux

What is speech recognition and how does it work?

Automatics speech recognition (also known as ASR) is a suite of technology that takes audio signals containing speech, analysis it and converts it into text so that it can be read and understood by humans and machines. It’s the technology that makes voice assistants like Amazon Alexa able to understand what a user says.

There’s obviously a whole lot more to it that than, though. So, in today’s episode, we’re speaking to one of the most knowledgable and experienced speech recognition minds the world has to offer, Catherine Breslin, about just exactly what’s going on under the hood of automatic speech recognition technology and how it actually works.

Catherine Breslin studied speech recognition at Cambridge, before working on speech recognition systems at Toshiba and eventually on the Amazon Alexa speech recognition team where she met the Godfather of Alexa, Jeff Adams. Catherine then joined Jeff at Cobalt Speech where she currently creates bespoke speech recognition systems and voice assistants for organisations.

In this episode, you’ll learn how one of the fundamental voice technologies works, from beginning to end. This will give you a rounded understanding of automatic speech recognition technology so that, when you’re working on voice applications and conversational interfaces, you’ll at least know how it’s working and then be able to vet speech recognition systems appropriately.


Visit Cobalt Speech

Read the Cobalt Speech blog

Follow Catherine Breslin on Twitter

Listen to Jeff Adams, Founder of Cobalt discuss the Alexa origin story

Alternative voice assistants to Alexa and Google Assistant

Alternative voice assistants to Alexa and Google Assistant 1800 1200 VUX World

I’m in Amsterdam today with JP and Jen who are over there looking rather nonchalant not joining me on the escalator, but in Amsterdam in Holland Google, Sorry, Alexa doesn’t exist.

The echo doesn’t exist.

It’s Google Assistant predominantly and I got me thinking about what would you do if those big commercial Voice Assistant platforms just didn’t exist?

Would you do without Google Assistant Siri Alexa,
are there other voice Technologies other platforms that you would use instead and for what use cases?

So JP had mentioned SoundHound for use in the car, you know looking at things like restaurants and navigation.

I think that this one is working nice.

I was talking about Snips before it was acquired being used in home electronic devices like coffee machines
and things like that.

So what other examples Do you have of other Technologies outside of the big commercial Technologies and what are the use cases?

What are they being used for? Let us know?

VUX design 2.0

VUX design 2.0 1800 1200 VUX World

Voice assistants should change how they interact with users based on how you interact with them and the context you’re in.

For example:

🎧 If I’m walking with my headphones on, and I ask Siri “What was the Boro score at the weekend?” Then I just want the score. Nothing more. Did they win or lose?

🔊 If I’m at home and I ask the same question to my smart speaker then I have a little more time. You can give me some stats, tell me who scored etc.

📺 If I’m in the living room and I ask the same question to Alexa on my TV, then I might want to see some highlights, watch some punditry etc.

This is VUX design 2.0.

So does anyone have any examples of this happening across the main voice assistants or have you included interactions like this in your experiences?

Not so bold predictions for voice in 2020

Not so bold predictions for voice in 2020 1800 1200 VUX World

So, here we are. The end of 2019. And it’s time for some click bait regarding 2020 voice-first industry predictions.

I told you to prepare for this a few weeks back.

While forecasting can be inspiring and encouraging, it could also be perceived as a holding message.

It’d be easy to hear about how advertising will come to smart speakers in 2020 and think “I’ll wait for that, then I’ll do something”.

Or read about 5G rolling out in the next 12 months and think “I’ll hang on until then”.

Or, maybe you’ll read how the voice mega-trend will cause a back-to-basics review of existing content, worry about how much work that’ll be and kick the can down the street.

Perhaps you’ll hear buzzwords like the importance of ‘compatibility and integration‘ and how ‘environment and context data‘ is key to enabling ‘transaction-oriented consumer intents’ in 2020, glaze over and think “this is just too complex”.

I have no problem with these kind of articles per se. I’ve enjoyed reading some of them. Especially this one from RAIN, this one from Vixen Labs and this one from Rabbit and Pork.

I just think they can often contain hopes and wishes or general, broad trends, rather than sober predictions of what might actually happen in the next 12 months.

And much of this stuff was predicted last year as well. Siri to open up third party development, anyone?

“There isn’t much in those predictions that wouldn’t have been suggested this time last year.”
David Low, Executive Product Manager, Voice and AI, BBC, incoming CEO, The List

While it’s nice to look forward, hope, predict or wish; critical even. It’s also good to understand what you should realistically do in 2020 if we continue to have another iterative year.

And much of this stuff was predicted last year as well. Siri to open up third party development, anyone? Click To Tweet

2020: another iterative year for voice

Perhaps a not so bold prediction (or a bolder prediction), which I subscribe to, is that 2020 will be an iterative year for voice assistants.

That might not be what you want to hear. But think about it. Think about some of the main challenges, like discoverability.

If it was easy to solve, it would have been solved already.

The big players could do something radical to address it, but what are the chances of that? It’s too risky.

The big players could do something radical to address it, but what are the chances of that? It's too risky. Click To Tweet

In reality, they’ll iterate towards solving these problems over time.

That’s because to address something like discoverability, you need to address the current set-up of the platforms, challenge our app-centric mental modal and really consider whether skills are the right solution.

Amazon have far too much invested in skills and too much smart speaker penetration to risk confusing the message or pivoting in a big way. At least not over the next 12 months.

The only platform I could ever see pivoting from the app-centric mental model is Google which, to be honest, already has. Actions aren’t just ‘apps’, anything Google Assistant can do is an action, including performing a web search. That’s how Google can claim to have over a million actions.

Further iteration isn’t really a bad thing

Maybe that’s not such a great prediction for a rapidly evolving space. But so what?

I don’t know why everyone’s obsessed with ‘rapidly evolving’ things anyway. The pace of technological advancement is obviously quick (and quickening), but user behaviour doesn’t change at anywhere near the same pace.

The pace of technological advancement is obviously quick (and quickening), but user behaviour doesn’t change at anywhere near the same pace. Click To Tweet

People will change their habits from screens to speaking, for the right kind of things, over time. People won’t break out of 15 years worth of mobile conditioning or 20+ years of screen-based, keyboard conditioning in the next 12 months.

The reality is, the platforms are already capable of more than people are using them for. It’s the advancement of user behaviour that we should concentrate on.

What the voice industry should do in the next 12 months  

What should happen in the next 12 months is that all of us in the industry should be doing the best work we possibly can, within the areas we can affect and influence, to give users the best possible experiences that increase their confidence and trust in voice assistants and voice interfaces.

Whether that’s putting voice search into your app, building an Alexa skill, putting a voice bot on your website, wherever it makes sense for your users. The main thing should be to provide quality, reliable experiences that do the job they need to, well, and consistently, and give users confidence in the medium.

But, you don't need to start big, you just need to start. Don't get caught short like you did with mobile and social. Click To Tweet

Confidence that’ll build over time.

Confidence that’ll turn into repeat usage, over time.

Confidence that will lead to bigger behavioural changes and unlock this door we’ve been banging on for the last few years.

And for brands, just start. Move the needle. Get off the starting blocks. Just. Move.

With smartspeaker penetration being over 20% in the UK and usage rising, reaching your target market with something they value is a real proposition. And that won’t happen on its own.

Maybe no one in your industry has done it yet. Maybe there aren’t any case studies for you to compare.

But, you don’t need to start big, you just need to start. Don’t get caught short like you did with mobile and social.

We already have the tools. We just need to actually use them

Merry Christmas and all the best for a VUXing epic New Year.

What’s the difference between a voice assistant and a voice user interface (VUI)?

What’s the difference between a voice assistant and a voice user interface (VUI)? 1800 1200 VUX World

Voice assistants and voice user interfaces (VUIs) are used interchangeably under the banner of ‘voice’. But what’s the difference between the two?

Voice assistants have a voice interface, but voice interfaces aren’t always an interface to a voice assistant. Click To Tweet

What is a voice assistant?

Google defines voice assistance as:

“Engaging with intelligent technology using voice as the method of input (e.g. a digital assistant like Google Assistant.”

Google's definition of voice search vs voice assistance

Image used with permission from @andy_head, CEO, Adido

WhatIs/Tech Target defines a voice assistant as:

“a digital assistant that uses voice recognition, natural language processing and speech synthesis to provide aid to users through phones and voice recognition applications”

The key word there is ‘aid’. A voice assistant aids and assists users in whatever it is they need assistance with, using speech as the primary method of receiving requests and responding.

That could be assisting you in accessing your calendar schedule, assisting you in finding a podcast, helping you get the news and even navigating through the apps on your phone, as we witnessed with the Google Assistant announcements at Google I/O ’19.

A key component of a voice assistant is the ability for it to access lots of different pieces of information from lots of different sources and funnel it all through one place.

For example, with Alexa, you can access you calendar, send SMS messages, book train tickets, order pizza and check your bank balance. Lots of information from lots of sources, all pulled through one place and accessed via voice.

Google Assistant, Alexa, Siri, Houndify, Mycroft, these are all examples of voice assistants.

What is a voice user interface?

Amazon defines a voice user interface (VUI) as something that:

“… allows people to use voice input to control computers and devices.”

A VUI, then, is an access point, in the same way a screen is an access point to a graphical user interface (GUI).

A voice interface is a front-end. A way of interacting with software with your voice, instead of tapping or typing.

For example, Spotify has just launched voice enabled ads in the US:

At the end of the ad, listeners can say ‘play now’ to play the advertised playlist. Using your voice to interact with the software instead of tapping. It’s not an ‘assistant’, it’s an ‘interface’.

Voice assistants have a voice interface, but voice interfaces aren’t always an interface to a voice assistant.

Hopefully that’ll help clear things up next time you read or talk about voice.

Talk to me with James Vlahos

Talk to me with James Vlahos 1800 1200 VUX World

This week, Dustin and I are joined by journalist and author, James Vlahos, to discuss the details of his book Talk to Me: How voice computing will transform the way we live, work and think. read more

Voice first user research with Konstantin Samoylov and Adam Banks

Voice first user research with Konstantin Samoylov and Adam Banks 1800 1200 VUX World

We’re talking to ex-Googlers, Konstantin Samoylov and Adam Banks, about their findings from conducting research on voice assistants at Google and their approach to building world-leading UX labs.

This episode is a whirlwind of insights, practical advice and engaging anecdotes that cover the width and breadth of user research and user behaviour in the voice first and voice assistant space. It’s littered with examples of user behaviour found when researching voice at Google and peppered with guidance on how to create world-class user research spaces.

Some of the things we discuss include:

  • Findings from countless voice assistant studies at Google
  • Real user behaviour in the on-boarding process
  • User trust of voice assistants
  • What people expect from voice assistants
  • User mental models when using voice assistants
  • The difference between replicating your app and designing for voice
  • The difference between a voice assistant and a voice interface
  • The difference between user expectations and reality
  • How voice assistant responses can shape people’s expectations of the full functionality of the thing
  • What makes a good UX lab
  • How to design a user research space
  • How voice will disrupt and challenge organisational structure
  • Is there a place for advertising on voice assistants?
  • Mistakes people make when seeking a voice presence (Hint: starting with ‘let’s create an Alexa Skill’ rather than ‘how will
  • people interact with our brand via voice?’)
  • The importance (or lack of) of speed in voice user interfaces?
  • How to fit voice user research into a design sprint

Plus, for those of you watching on YouTube, we have a tour of the UX Lab in a Box!

Our Guests

Konstantin Samoylov and Adam Banks are world-leading user researchers and research lab creators, and founders of user research consultancy firm, UX Study.

The duo left Google in 2016 after pioneering studies in virtual assistants and voice, as well as designing and creating over 50 user research labs across the globe, and managing the entirety of Google’s global user research spaces.

While working as researchers and lab builders at Google, and showing companies their research spaces, plenty of companies used to ask Konstantin and Adam whether they can recommend a company to build them a similar lab. Upon realising that company doesn’t exist, they set about creating it!

UX Study designs and builds research and design spaces for companies, provides research consultancy services and training, as well as hires and sells its signature product, UX Lab in a Box.

UX Lab in a Box

The Lab in a Box, http://ux-study.com/products/lab-in-a-box/ is an audio and video recording, mixing and broadcasting unit designed specifically to help user researchers conduct reliable, consistent and speedy studies.

It converts any space into a user research lab in minutes and helps researchers focus on the most important aspect of their role – research!

It was born after the duo, in true researcher style, conducted user research on user researchers and found that 30% of a researchers time is spent fiddling with cables, setting up studies, editing video and generally faffing around doing things that aren’t research!

Konstantin Samoylov

Konstantin Samoylov is an award-winning user researcher. He has nearly 20 years’ experience in the field and has conducted over 1000 user research studies.

He was part of the team that pioneered voice at Google and was the first researcher to focus on voice dialogues and actions. By the time he left, just 2 years ago, most of the studies into user behaviour on voice assistants at Google were conducted by him.

Adam Banks

It’s likely that Adam Banks has more experience in creating user research spaces than anyone else on the planet. He designed, built and managed all of Google’s user research labs globally including the newly-opened ‘Userplex’ in San Francisco.

He’s created over 50 research and design spaces across the globe for Google, and also has vast experience in conducting user research himself.


Visit the UX Study website
Follow UX Study on Twitter
Check out the UX Lab in a Box
Follow Kostantin on Twitter
Follow Adam on Twitter