fbpx

Voice with X-Ray vision?

Voice with X-Ray vision? 730 464 Ben McCulloch

Learn how Disruptel is disrupting TV content consumption with voice assistant capabilities that can see what you’re watching.

Hang up your pipe, Sherlock

Imagine you’re watching Elementary on your TV and you love the jacket Lucy Liu is wearing. You’d buy it if you could… but how would you find that jacket?

Your phone’s probably already in your hand, so you become Sherlock. You search ‘Lucy Liu jacket Elementary TV’. Where would that data even be? How would you refine that search? You’d add the season or episode number if you knew it – which you probably don’t.

It’s going to take a lot of time to search, and you’ll possibly never find it. Plus, your attention will be away from the TV.

So, what if you could just say to your TV “what jacket is Watson wearing” and the answer appears on screen (Lucy Liu played Dr. Joan Watson in the show).

You get to keep watching – your attention never leaves the TV – and you get what you need. Friction is reduced.

That’s Disruptel aims to deliver.

X-Ray vision

So what’s the big idea?

Disruptel has trained machine learning models to see people and objects within TV content. This means you can ask specific questions about what you see. A voice assistant that can see!

It’s like their system has x-ray vision that can see the metadata attached to what’s on the screen, in real time. If you ask a question, Disruptel uses a combination of metadata, knowledge graphs, computer vision and more, to find the answer – who’s the actor, what’s the show, what’s the jacket and so on. It can even recognise animals and skylines so you could ask “what breed of dog is that?” or “what city is that?”

Giving context to voice tech

But hold on, what’s the bigger idea?

Voice assistants don’t do context well. If you ask a voice assistant “tell me who that is”, chances are the voice assistant will be clueless.

Disruptel has added eyes to the voice assistant. Suddenly it has a much better chance of understanding context. Asking “tell me who that is” can now receive the correct answer “that’s Lucy Liu”. Even more compelling: “who’s that guy on the left?”

Alex Quinn, CEO, Disruptel says, “we call this the world’s first voice assistant that can see.”

Do you think this is only useful for searching Watson’s jacket on TV?

Alex says, “this is really helpful for a lot of things. But starting out, I think that TV and entertainment content is a great domain and stepping stone for us.”

Enter the metaworld (no, not that one)

The Disruptel system looks at video to recognise specific people, places, brands, and even dog breeds. Why limit that tech to TV?

Imagine being able to look at anything in the world and ask about it? THAT’s the potential of Disruptel’s technology.

Alex, when speaking on the VUX World podcast: “At the end of the day, our computer vision systems have been trained on people, objects, products… [whether it’s] recognizing on the screen or in the real world – really makes no difference”.

Imagine you’re walking in busy city streets. You see something that interests you, so you ask about it. You listen to the response while you walk. That’s an incredible and frictionless future. A future that even Amazon and Google have tried but failed to bring about to any great success so far.

Finding the balance between trivia and sales

Disruptel will monetise though inbound advertising: matching what you ask for with a brand that can sell it to you. They’ll serve up specific ads based on what’s in the show and what the user asks.

How carefully the advertising model will be incorporated remains to be seen. When we watch TV, we’re used to adverts, but having one pop-up right in the middle of your favourite show might be a bit too much. Disruptel aims to have the advertising work more like PPC, wherein ads will be served when something is asked.

There’s a challenge here of communicating to users specifically what they can ask about in a given scene. Raising awareness about the capabilities of voice assistants is a long standing challenge.

Thinking further ahead, when the full promise of this technology could be realised in the real world; can you imagine being directed to a pet shop just because you saw a cute pup in the park and asked what breed it was? Striking a balance between inbound and interruption will be key.

Want to learn more about the future of voice assistants that can see? Check out the full episode with Alex Quinn on the VUX World podcast.


This article was written by Benjamin McCulloch. Ben is a freelance conversation designer and an expert in audio production. He has a decade of experience crafting natural sounding dialogue: recording, editing and directing voice talent in the studio. Some of his work includes dialogue editing for Philips’ ‘Breathless Choir’ series of commercials, a Cannes Pharma Grand-Prix winner; leading teams in localizing voices for Fortune 100 clients like Microsoft, as well as sound design and music composition for video games and film.

    The world's most loved conversational AI event is back
    This is default text for notification bar
    Share via
    Copy link
    Powered by Social Snap