Introducing ‘VTA’: Voice-to-Action

Introducing ‘VTA’: Voice-to-Action 1800 1200 Kane Simms

As the world evolves, we develop new terminology, new job roles and new language to describe some of the new things that a fledging industry brings.

It’s happened before

The internet gave us words like browser, website, hyperlinks and html, e-commerce and email.

It created job roles like web developer and web designer.

Mobile gave us words like texting and apps. We no longer made websites for one screen size, we made them responsive. We use gestures like tap or swipe. 

It gave us roles like app developer, UX designer or UI designer

Social gave us words like like, hashtag, share, stories, filters and created social media managers, content creators and influencers

And it’s happening in voice

Here’s an account of some of the terminology we already use and a few new ones I’d like to introduce.

Current terminology used in the voice industry

So far, the voice industry has brought us:

VUI: Voice User Interface

A voice user interface is anything that you can input data into or retrieve data from using your voice.

This term was used to describe designing IVR systems for a long while. Well-before voice assistants. See Voice User Interface Design (2004) by Michael Cohen, James Giangola, and Jennifer Balogh. Cathy Pearl brought the term back to life for voice assistants with her book Designing Voice User Interfaces (2016).

VUX: Voice User Experience

I always thought I was one of the first to coin the term VUX, having registered the VUX.World domain name and Twitter handle in June 2017. I’d never heard of or seen the term VUX before then.

However, a search of the Twitter archives shows the first use of #VUX was used by Tom Gilley in April 2016, almost a year before!


This is the act of purchasing something with your voice.

Another contentious one, but I was sure I coined this term at the beginning of 2018. The first time I published it was in May 2018 in the episode with Simonie Wilson of PinDrop.  

However, I couldn’t have been more wrong.

A chap called Spiros Margaris tweeted the first mention of v-commerce relating to voice that I can find on Twitter. Great minds think alike.

Interestingly, the earliest mention of v-commerce on Twitter was in Jan 2009 when v-commerce stood for ‘video commerce’:

Then, in 2016, the New York Times apparently coined the term v-commerce for ‘virtual commerce’, focusing on AR. 

Let’s hope voice sticks around for longer than those two and claims the title once and for all.


C-commerce is a broad term for conversational commerce that could happen on any ‘conversational’ channel, such as What’s app, Messenger, iMessage, chatbot or voice assistants and voice user interfaces. 


Generation-V is anyone born after the introduction of voice user interfaces. Voice-natives.

I’m pretty sure Katy McMahon of Houndify has another term for this, or possibly coined this one (I’m sure I read that somewhere, but can’t find the source).

This is another one that I thought I’d coined in this post on Generation-V from Aug 2018.

(I’m realising that it’s pretty easy to coin a phrase in voice: just strip out the ‘oice’ and stick the ‘v’ at the beginning or end of a word. Chances are, we all invented them all).

One of the first mentions of Generation-V on Twitter was by  David Erickson Nov 2013, but it’s not clear whether he’s referring to Generation Voice.

It’s also a vape shop in the US and a term for Generation-Vegan

Alex Chaucer was the first to use #GenerationV in the context of technology in June 2018. It stood for Generation VirtualReality. 


Voice-first is used to describe an interface or device that has voice as its primary input and/or output method, apparently coined by Brian Roemmele.

This is probably one of the most notable and most used terms in the voice industry today.

Introducing: VTA (Voice-to-Action)

I’d like to introduce a new phrase into this list: the VTA or Voice-to-Action.

It’s similar to a call-to-action, but one that takes place within a voice experience or app that enables users to initiate another voice experience or carry out a task.

Examples of a VTA

Amazon Alexa uses VTAs in the following ways:

Similar skill recommendations

Once you’ve used a certain type of skill, it’ll recommend another skill of the same kind:

“If you like Song Quiz, you might like Beat the Intro, want to try it?”

You can enable the recommended skill and start engaging with it just by saying ‘yes’.

That’s a VTA.

Rate this skill prompts

When you quit a skill, Alexa might ask you:

“Do you want to leave a rating?”

If you say ‘yes’, you can then leave a rating with your voice.

That’s a VTA.

In-skill purchases

Any suggestion of an in-skill purchase:

“Would you like to subscribe to this podcast for £1 per month?”

You can agree, subscribe and pay, all with your voice.

That is a VTA. 

Interactive audio ads

Instreamatic allows advertisers to generate interactive voice ads whereby the user is prompted to engage, with something like:

“Want to know the biggest trend in voice for 2019?”

You can say ‘yes’ to enable the ad.

That’s a VTA.

Then, at the end of the ad, you can say ‘I’m interested’ or ‘tell me more’ and be taken to either a landing page or whatever the ultimate end point of the campaign is. Although you’re not continuing a voice experience, you’re interacting with the call to action with your voice.

That’s a VTA.

Other voice calls-to-action

I have to credit Stuart Crane with giving me the inspiration for VTA after commenting on my post on LinkedIn about voice-first discoverability and how we shouldn’t be expecting a free meal.

He mentioned Respond Fast, who’re trying to coin the phrase VACTA: Voice Activated Call To Action, which isn’t the same as VTA.

Respond Fast allows advertisers to use it’s skill to provide further information after the user encounters an ad elsewhere.

For example, you see an ad on TV and the call-to-action is ‘ask Respond Fast for more information’, so you open the Respond Fast skill. Then, it asks you for the promo code you heard in the ad. Once you tell it, it’ll serve you content related to the ad audibly in the skill.

That’s different to VTA as a Voice-to-Action requires the prompt and the response to be a voice interaction.

Plus, VTA is cooler 🙂

And another phrase: convo designer

A convo designer is a cooler, more concise term for conversational designer. I’ve found myself using it quite a bit and thought I’d share it.

Sounds cringey at first, but it’s grown on me.

There’s a broader issue that should be discussed, which I’ll save for another post, but it’s regarding the differences or similarities between a VUI designer, a VUX designer, a dialogue designer and a convo designer.

What do you think?

I’d love to know your thoughts on this. Let us know your comments or questions by uploading a snippet below and we’ll play and discuss them on the next podcast episode.

Oh, and the book I mentioned is Voice user interface design by James P. Giangola, Jennifer Balogh and Michael H. Cohen.

    The world's most loved conversational AI event is back
    This is default text for notification bar
    Share via
    Copy link
    Powered by Social Snap