You know that moment when the live agent connects to your call. You hear their first words, and your mind is scrambling to detect their accent. You may like their accent, dislike it, or struggle to understand them.
Around 60% of enterprises across Western Europe and North America have some type of partnership with an outsourcer. That means that, when you call a contact centre, there’s a 6 in 10 chance that you’ll be speaking to someone who doesn’t speak English as a first language. Here, you have the potential for communication issues that will affect the quality of the conversation when you end up speaking with someone who has a really strong accent that you struggle to understand.
What if the live agent’s accent could be transformed, so they sound like someone you hear every day and therefore can understand more easily? Imagine being able to filter the voice and essentially remove the accent from it.
That tech is exactly what Tomato AI is building. Ofer Ronen, CEO & Co-Founder, Tomato AI joined Kane Simms on the VUX World podcast to discuss the technology’s potential, and the ethical considerations surrounding it.
Hey, you sound just like me!
Tomato AI has created voice filtering technology that can adjust a speaker’s accent. It takes their audio stream and makes their accent sound more neutral or as if they’re from a different region.
This is particularly useful in call centres to enhance customer interactions. The technology involves real-time processing, ensuring minimal latency, and operates in noisy environments with low-end setups, typical in many call centres.
As Ofer Ronen says, “we’re starting with companies that are doing outbound from various places, calling folks and often getting 50% or more hangups because of lack of trust. And so there’s a huge opportunity to move the needle for these businesses and transform them.”
Customers get through quicker
There are a few reasons why this technology can enhance experiences.
There should be less misunderstandings in calls. Many companies use outsource call centres (BPOs) which are often offshore. No matter how much training those live agents get, and the best intentions from customer and live agent to progress the conversation forwards, it does take extra effort to understand an accent that you’re not used to hearing. That creates friction, which means that calls take more time and aren’t always a great experience.
Also, call centres are noisy places, and having live agents work from home doesn’t solve that problem either. Extraneous noises in a call aren’t helpful. Generally speaking, they’re noises that have no relation to the phone call (such as other conversations or traffic noise from an open window). Those sounds also make us listen harder, and add effort. Tomato AI has AI-based denoising features that remove background noise, so we can focus on the agent’s voice.
Live agent’s ethnicity isn’t in the spotlight
However, not every benefit is for the users. Consider the live agents. Some may struggle to be understood, no matter how hard they try. While we can appreciate a strong accent in entertainment (such as Fran Drescher’s NY tones), when we just want to get something done, we may prefer to talk with someone we instantly understand.
This means that a broader talent pool may be considered for the live agent role, and call centres are always struggling to obtain and retain talent. Ofer sees this issue with clients everyday, “[clients] have a hard time finding enough talent in the market. Let’s say in the Philippines, or Pakistan, this makes the hiring pool much bigger. More people that are capable become available to hire because their voice now becomes acceptable.”
There’s also the considerable issue of prejudice that live agent’s face. They’re usually just the messenger, but they’re also in the firing line, so to speak. Angry customers will vent at them. Occasionally that leads to customers focusing on aspects of the live agent’s character. In a phone call, the most likely target will be their voice and accent.
When Indian call centres proliferated they became a stereotype that was regularly mocked – even in Transformers – so Indian call centre workers may appreciate this technology. Live agent’s accents would get masked so customers wouldn’t know their identity. Agents may well appreciate being able to get on with their job without their accent becoming part of the discussion.
While you could say that society needs to deal with its prejudice against ethnicities, while we wait for that to happen we can transform agent’s voices to improve accessibility today.
My accent is my heritage
Tomato AI uses STS (Speech To Speech), rather than TTS (Text To Speech), which retains the speaker’s tone, sound and prosody. That means that the person speaking still sounds just like the person speaking. The only difference is that their accent has changed. While this is a perfect use case for utility-based speech to speech, it does throw up some ethical considerations.
One such consideration is whether this technology could dehumanise people by removing or diminishing part of their identity. Our accents represent us – our heritage, experiences, education, class and so on. This technology has the potential to make us sound generic. By giving everyone the same accent, could it make the world sound vanilla?
Another is how well prosody translates to different accents and locales? Does Indian prosody when making jokes sound the same as ‘funny’ American prosody? Prosody is a vital component of language for conveying meaning. While this tech may make it easier for customers to understand a live agent’s accent, could it impact the agent’s ability to express their message?
And what if you could enjoy a call with a live agent, purely because their accent is familiar, pleasing or just interesting? Would you warm to the person you’re talking to? In their role, the agent represents a brand, and are bringing you closer to that brand. Could this decrease the amount of diversity a brand is able to demonstrate?
Finally, with the looming AI Acts that every Government across the planet is working on, one of the considerations across Governments is transparency. Therefore, should these agents have to inform users that they are using technology such as this?
Good results so far
So far, Tomato AI’s technology has shown promising results in pilot studies. It’s engaged in several pilots with large offshore call centres. Early feedback indicates significant potential in improving call centre operations where trust is crucial.
Preliminary data suggests a potential 10-20% improvement in customer satisfaction (CSAT), call handle time, and sales performance.
Deciding how to use it
Previously, the ability to digitally transform an accent in real time would have seemed unbelievable. But it’s become possible.
The question we now have to ask ourselves is; when should we use this, and why?
Tomato AI’s voice filtering technology seems to show early indicators that it can improve call centre performance and broaden the hiring pool. At the same time, it poses important ethical questions that need careful consideration.
Thanks to Ofer Ronen for sharing his insights in what was a really great and detailed conversation! You can watch the full video here.