How to navigate the 3 risks of LLMs in CAI

Ben McCulloch
March 20, 2024
in Article, Uncategorized

How to navigate the 3 risks of LLMs in CAI https://vux.world/wp-content/uploads/image.png 1920 1080 Ben McCulloch Ben McCulloch https://secure.gravatar.com/avatar/b1f3549c2d953651d69f59ec1fa801a3?s=96&d=blank&r=g March 20, 2024 March 20, 2024

Most companies want to know how to find success with generative AI today. Few have managed to demonstrate it. Loop Car Insurance is one company that’s getting it right.

It launched a generative, LLM-based bot with the help of Quiq last year that has reduced email support by 55%. So how did it do it?

Kat Garcia, Director of Member Services at LOOP Car Insurance, and Mike Myer, CEO, Quiq, joined Kane Simms in a recent webinar, to discuss the behind the scenes factors in what it takes to get a generative AI chatbot to production in 2024.

Essentially, success comes from mitigating the 3 biggest risks in generative AI for enterprise: presenting the wrong advice, handling privacy and protecting your brand image.

Risk #1 – Giving the wrong advice

LOOP provides car insurance. It needs to be absolutely sure it’s providing correct information at all times.

Being ‘correct’ is contextual of course. LOOP works within a specific sector of the insurance industry (providing it to car owners), they work within a specific geographical area (Texas, America), and it has its own products and services. This means that the information it provides must be specifically tailored to its use cases.

So how do you achieve that?

You train the LLM on your documents and data. You ensure that your knowledge base is up to date, doesn’t contain contradictions (by following a ‘single source of truth’ approach), and it should only contain documents that you know to be relevant (that list of cake recipes you keep on your hard drive should be kept well clear of the training data).

As Mike says, when Quiq implemented the LLM for LOOP, it needed to make sure “the answer is using the information that was provided by LOOP and no other information.”

Quiq also implemented a novel approach to prompt chaining and guardrails that ensures the Loop agent doesn’t go off course or hallucinate, and fails gracefully if it can’t answer a specific question.

Risk #2 – treat your customer’s data with care

You don’t just need to be careful about the materials that go into the bot when you design and train it. In every conversation, the customer could provide PII. There’s a very good chance your bot will directly ask the customer for their data, as a means of identifying them and providing a helpful service.

You need to treat that data with respect and care. Whose servers will it end up on? Are you sure that every third-party data handler involved with your bot has suitable processes for dealing with that data? Are you definitely using models that won’t use your data as training data? Are you scrubbing PII data from your interaction logs?

LLMs are new to most people, and there are a lot of startups offering them. Make sure you do your due diligence and only use models that are fit for enterprise consumption.

Quiq chose to use OpenAI’s GPT 3.5 model via Microsoft Azure for this deployment and the platform being SOC 2 compliant took care of any other concerns.

Risk #3 – Is the bot doing damage to your brand?

Consider this: the bot might be the first contact a customer makes with a brand. It’s generating responses on the fly that will affect whether the relationship continues or not. Your bot might be a make or break interaction that either wins or loses a customer. The stakes are that high!

As Mike says, “we want to make sure that the answer that gets provided is on the brand tone, and is not something that somebody is going to screenshot and post on social media somewhere.”

We’ve recently seen this happen a few times. Chevrolet was caught out, as was DPD. Even Air Canada was publicly shamed for its ‘AI’ mishap, however the incident occurred before the widespread use of LLMs, and so it was most likely caused by an old-school NLU-based bot that hadn’t been updated properly. All the Air Canada example shows is that people will jump to their own conclusions about how you made a mistake, but it’s the mistake that they will remember!

When using generative AI to create a bot’s utterances you can define the personality. It’s best practice. You want the responses to be consistent within the experience, you want them to be contextually relevant to the experience (for example, a bot helping people in a hurry shouldn’t be verbose), and you want them to represent the brand.

Although Quiq didn’t go as far as developing a fully fledged ‘personality’ document, it gave instructions on how to respond and made sure the agent is conscientious, empathetic and has a more casual tone.

The end result

Since going live in 2023, LOOP experienced a 3x improvement in resolution rate compared to its previous assistant. Its LLM-based bot is answering 50% of customer questions, 24 hours per day, all year long (whereas its call centre is open 8 hours a day). It also measured customer satisfaction and found that the bot was on par with their human agents. Finally, it has managed to help reduce email support by 55%, saving human agents time.

This is the perfect example of a use case that’s right for generative AI, as well as a technology choice and implementation approach that fully understands and mitigates the risks involved. A great model for anyone wanting to do enterprise generative AI properly.

Watch the full webinar and learn the real details behind this deployment.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
resolution	session	This is a functionality cookie used to collect the horizontal value of the visitor screen resolution. It helps in optimizing the website view to the user.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_111445333_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
ajs_anonymous_id	never	This cookie is set by Segment.io to check the number of ew and returning visitors to the website.
CONSENT	16 years 2 months 25 days 18 hours	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__smSessionId	9 hours	No description available.
__smToken	1 year	This cookie is set by the Sumo. This cookie is used for verifying whether the user is logged in or not.
__smVID	1 month	This cookie is set by Sumo. The purpose of the cookie is not yet known.
_mailmunch_visitor_id	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
AnalyticsSyncHistory	1 month	No description
attribution_user_id	1 year	This cookie is set by the provider Typeform. This cookie is used for Typeform usage statistics. It is used in context with the website's pop-up questionnaires and messengering.
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
debug	never	No description available.
intercom-id-or0x2acp	8 months 26 days 1 hour	No description
intercom-session-or0x2acp	7 days	No description
li_gc	2 years	No description
li_sugr	3 months	No description available.
mailmunch_second_pageview	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.

How to navigate the 3 risks of LLMs in CAI

Risk #1 – Giving the wrong advice

Risk #2 – treat your customer’s data with care

Risk #3 – Is the bot doing damage to your brand?

The end result

The evolution of the full-stack conversation designer

Why CAI needs skin in the game