Why NLU training needs brains

Ben McCulloch
June 13, 2022
in Article, Opinion

Why NLU training needs brains https://vux.world/wp-content/uploads/089785F1-E87F-42E6-958D-1609DFF65C8D.png 2404 1798 Ben McCulloch Ben McCulloch https://secure.gravatar.com/avatar/b1f3549c2d953651d69f59ec1fa801a3?s=96&d=blank&r=g June 13, 2022 June 10, 2022

While preparing this blog I had an uncanny moment.

I was reading transcripts of Kane’s interview with Rasa’s Alan Nichol and simultaneously wondering how accurate some of the transcription was. Was the ASR used for the transcription accurate? It’s important because you have to be
confident about what people are saying to get what they really mean.

It goes deeper too. I read the transcript first rather than listening to the entire interview because it allows me to jump right through the interview and find the juicy bits. Had I been listening to the hour-long interview, pausing, replaying and reviewing the audio it would have taken me many hours just to get the jist of the conversation. I only use the audio to confirm what I’ve read. I may be sure of the audio accuracy (it’s the source of truth) but it would have taken much longer.

The thoughts I had matched perfectly with some of the things Alan and Kane were discussing; how confident we are about NLU, how we scrutinise data, and when we should trust our own brains?

Will AI see a walnut or a brain?

Taking your best guess at what users say

All conversational designs are hypothetical. You’re guessing what people will want, you’re guessing how they’ll ask for it, and you’re guessing how well you’ll be able to understand what they will say.

That’s a precarious position to be in. If things go wrong it could be costly.

To mitigate that risk there are strategies available.

Start with data

look long enough and you’ll find patterns

Alan recommends returning to your transcripts, as they are the source of truth – they tell you what users want and how they say it. Your ability to process the information contained within is unparalleled because you understand cTrain your model with a data set that represents the problem you’re trying to solve. Get real data from real customers and work with that.

As Alan says, “the problem you need to solve is – how can you create a dataset that best represents how your users are actually talking?”

But it’s not just about letting the machines crunch the data and spit their results out at you. You should also familiarise yourself with that data. You won’t be able to read everything as quickly as the machine does, but you will be able to draw conclusions that a machine cannot.

For example, if a user says “I took the dog for a walk and bought new shoes” in a single sentence, it could be chunked into a single unit of data, as if ‘taking the dog for a walk’ and ‘buying new shoes’ are naturally connected. But they might just be talking about their day, and didn’t mean those two events were connected.

We humans will work that out fairly quickly, but to a machine it’s not so simple. Our brains are incredible at finding patterns, too.ontext.

The intensity of intents

See the image? That’s the transcript of Alan talking, and the reason I started to wonder about transcription accuracy. Transcripts speed up my work incredibly, but sometimes human language just isn’t so easy to capture. He wasn’t saying “intense” – he was saying “intents”.

Despite the massive challenge of transcribing what people say, it’s only one part of the problem. What did Alan mean? That’s

where conversational AI gets gritty. That’s right at the centre of everything we do.

Dealing with ambiguity

An airline bot may ask a user “was the flight delayed?” to which the user replies “unfortunately.”

In that context ‘unfortunately’ means ‘yes’. So you could mistakenly add ‘unfortunately’ as a sample utterance to the intent; ‘yes’.
Then in the same experience, the bot could ask “would you like to book more flights?” to which the user replies “unfortunately, no.”

Does the addition of the word ‘no’ always mean negative, while ‘unfortunately’ said by itself always means positive? If you went ahead with that hypothesis you could be on very unstable ground as both yes and no intents could be triggered by the same word, when the only differentiator is a tiny and easily misheard ‘no’.

You might not hear “no” and that could make or break the experience

What we say and what we mean – it’s all about context

Knowing what users mean and what they want gives you superpowers. That is the most useful piece of data because, if you know their intention, you can provide the steps to take them to their goal.

Intents are discussed often within the industry, because many people rightly wonder if there is a better way to structure conversations with machines. Perhaps there is a better way, but right now intents are ubiquitous. Intents allow you to zero in on the things your user might want to do with your experience. Add too many intents and it becomes unstable, add too few and it might not be very helpful. Sadly, there’s no rule about the perfect number because it depends on what you’re trying to build.

But if you build your experience around real customer data, keep paying attention to how accurately you’re understanding your users, and keep checking how you map their words to what they want, then you’re on your way. Keep paying attention to the users and their problems, and they’ll tell you what they need.

As NASA’s Gene Kranz said, “work the problem”.

This article was written by Benjamin McCulloch. Ben is a freelance conversation designer and an expert in audio production. He has a decade of experience crafting natural sounding dialogue: recording, editing and directing voice talent in the studio. Some of his work includes dialogue editing for Philips’ ‘Breathless Choir’ series of commercials, a Cannes Pharma Grand-Prix winner; leading teams in localizing voices for Fortune 100 clients like Microsoft, as well as sound design and music composition for video games and film.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
resolution	session	This is a functionality cookie used to collect the horizontal value of the visitor screen resolution. It helps in optimizing the website view to the user.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_111445333_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
ajs_anonymous_id	never	This cookie is set by Segment.io to check the number of ew and returning visitors to the website.
CONSENT	16 years 2 months 25 days 18 hours	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__smSessionId	9 hours	No description available.
__smToken	1 year	This cookie is set by the Sumo. This cookie is used for verifying whether the user is logged in or not.
__smVID	1 month	This cookie is set by Sumo. The purpose of the cookie is not yet known.
_mailmunch_visitor_id	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
AnalyticsSyncHistory	1 month	No description
attribution_user_id	1 year	This cookie is set by the provider Typeform. This cookie is used for Typeform usage statistics. It is used in context with the website's pop-up questionnaires and messengering.
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
debug	never	No description available.
intercom-id-or0x2acp	8 months 26 days 1 hour	No description
intercom-session-or0x2acp	7 days	No description
li_gc	2 years	No description
li_sugr	3 months	No description available.
mailmunch_second_pageview	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.