Hacking speech systems

Hacking speech systems

Hacking speech systems 863 647 VUX World

Yesterday I posted about how speech recognition systems, voice assistants, have trouble with different accents like Irish accents, Welsh accents, Scottish accents, Northern accents and how hard it is to actually train those systems based on accents because a different accent is essentially like a different language.

Now, yes, there are ways that you can hack the system essentially with training data.

So if I say “I fancy some cake, but the speech recognition system thinks I say “I fancy some Kate”, regardless of how much I may or may not a fancy Kate Winslet or Kate of any kind. That’s not actually what I was saying, and so in the training data instead of cake, you can put the word Kate as well as cake and that will essentially be a way of getting around some of the speech recognition inaccuracies.

(As an aside, the transcription tool I used for this article understood cake as kick and Kate and kit)

The problem with that is that you’re never going to find all the issues, you’re never going to be able to find enough people to test with to cater for every single different type of accent in the locale that your designing for.

You’re never going to find all of the ways that all of the different accents say all of the different things that your system needs to listen out for and to do that at scale across thousands of intense is virtually impossible.

You’re only ever going to be able to do that with a really narrow use case and a small conversation.

Really, the problem needs to be solved at the platform level. At the Microsoft, Apple, IBM, Amazon, Google level.

Now, yes, you can create your own speech to text or tune your own speech to text and create our tune your own NLU, and lots of companies do that. But not everybody has the budget, not everybody is operating at such a scale. Some companies are just trying to get started and need to use the tools at their disposal.

We do need to do all of that kind of synonym mapping now, we do need to essentially hack the training data to make it work as best as we can right now, but the platforms themselves need to get better at it.