When evaluating AI initiatives, I see many organisations use NPS and CSAT as barometers of success. But there’s a problem with both of those metrics that make them ineffective at measuring the success of interactions with AI services.
This has been a top of mind topic lately. I’ve written about how containment is the wrong metric a long while ago. Now, it’s time for NPS and CSAT to take their turn. We covered this in detail in the conversation I had with Pypestream CEO, Richard Smullen on the VUX World podcast, and it’s a conversation I’ve been having with clients for almost 2 years. The eagle eyed among you might be thinking “Hold on a minute, Kane, you recently posted about how the exact same way you measure value today is the exact same way to measure the value of AI tomorrow. I measure value with NPS and CSAT, so what gives?”
As you’ll see in this post, there’s nothing inherently wrong with NPS and CSAT, depending on what you’re trying to measure. There are three levels that you want to measure success within your business and with AI:
- Business level metrics like cost, revenue, loyalty and such.
- Journey level metrics like satisfaction and goal completion,
- Interaction level metrics like experience, adoption and usefulness.
What’s wrong with NPS?
NPS (Net Promoter Score) is a long-term business-level metric that measures how likely a customer is to recommend your brand to a friend. For decades, this has been a keen indicator of happy customers and loyalty.
However, you can’t use NPS to judge the success of an interaction with AI. Because it’s a long term metric, it takes into account everything the customer knows and feels about your brand. It’s not intended to measure the interaction the customer has just had, but in general terms, how loyal they are likely to be. One interaction isn’t enough to sway customer loyalty in many cases.
Let’s say you’re a bank. You have a chatbot. Your customer uses it and then you ask them an NPS question. That doesn’t tell you how effective your chatbot is. It tells you how loyal that customer is.
For example, take Mary. She’s banked with The Bank of Kane for the last 15 years. She trusts the Bank of Kane with her life savings and mortgage. Yet, she has a poor experience with the Bank of Kane’s chatbot (I know, not likely 😉 but play along). When Mary is asked the NPS question, she’s bringing all of her trust and baggage with her. It’s likely she’ll still give a high NPS because she trusts the bank with her life savings!
Now, you might rephrase the question and ask ‘based on your experience of this chatbot, how likely are you to recommend The Bank of Kane to a friend?’ which I’ve seen before. But that’s conflating two types of questions: a short-term question asking about the experience itself and a long-term measure of loyalty. It doesn’t sufficiently differentiate between how you feel about the brand overall vs how you feel about the experience you’ve just had. It’s like a travel agent trying to determine long-term loyalty by asking a question about the flight. Some freak turbulence might have made the flight a nightmare, but that won’t affect the overall holiday.
The Einstein quote; “If I had an hour to solve a problem and my life depended on it, I would use the first 55 minutes determining the proper question to ask, for once I know the proper question, I could solve the problem in less than five minutes“
When trying to quantify the effectiveness of an interaction, NPS is the wrong question.
What about CSAT?
Let’s take CSAT instead, surely that’s a better metric? Well, no, not really. Not if you’re trying to measure the interaction itself.
CSAT (Customer Satisfaction) is a short-term metric that measures how satisfied a customer is with a service in the moment. Generally speaking, it’s a good measure and will tell you whether there are any service issues you need to deal with. But it won’t always tell you about the effectiveness of the specific solution a customer has just interacted with.
For example, Jim is also a customer of The Bank of Kane, and Jim wants to take out a new credit card. So he has a chat with the Kane AI Agent that tells him that his credit rating isn’t high enough and that there isn’t a credit card sufficient for him. Then, he’s asked a CSAT question: how satisfied are you with this service?
What do you think he’ll say? Unsatisfied, of course. Why? Because he didn’t get the outcome he was hoping for.
Notice Jim’s dissatisfaction has nothing to do with the experience of interacting with the AI agent. It’s got everything to do with the fact that he was told something that he didn’t like and he hasn’t got the result he wanted. Even though the Kane AI Agent did everything absolutely right and was able to tell Jim whether he’s eligible in 30 seconds.
So what’s the solution?
The solution: effort
Customer Effort Score asks how effortful the interaction you’ve just had was. It’s got nothing to do with how you feel overall about the brand and so it cuts through the baggage and bias you have. It’s got nothing to do with how you feel about the outcome and result you’ve just received, so it doesn’t penalise a perfectly working service. It simply appraises the interaction you’ve just had and whether or not it was easy or difficult to use.
That’s the score that matters when trying to measure the effectiveness of the interaction with your AI solution. Why? Because why else do you want to use AI? Surely because it’s easier, more efficient and more effective than other channels, modalities or technologies? How do you measure that? Effort.
We have a formula for measuring effort which takes into account turn count, escalations, abandons, fallbacks, disambiguations and goal completion that I’d happily share more about for those interested.
Is it perfect? No. Is it good enough? Absolutely.
So, use NPS and CSAT to measure loyalty and satisfaction on the whole, but not to measure the effectiveness of the interaction itself. Use effort for that.