The most innovative use of voice technology so far?

Kane Simms
April 15, 2020
in Article, Opinion

The most innovative use of voice technology so far? https://vux.world/wp-content/uploads/2020/04/IMG_4456-scaled.jpg 2560 1440 Kane Simms Kane Simms https://secure.gravatar.com/avatar/26839585565b6484d0560f5e365378f0?s=96&d=blank&r=g April 15, 2020 October 8, 2021

What Descript did with Lyrebird AI might be the most innovative use of voice technology I’ve ever seen.

Descript is a transcription service. A very good transcription service, but a transcription service nonetheless.

You provide it with a spoken word audio sample: a podcast recording, a video, a meeting, and it’ll transcribe it for you pretty accurately.

You’ve always been able to edit the transcript and see those edits effect the audio sample. If you remove a few words here and there, it’ll delete them from the audio and vice versa if you edit the audio.

But editing is always subtractive. Taking bits out is easy. What about if you wanted to add something in?

Lyrebird rose to fame by being able to accurately clone people’s voices and generate a text to speech synthetic voice from as little as 1 minute of sample audio.

Descript acquired Lyrebird in 2019 and has now rolled out Overdub, a feature that let’s you type edits into your transcript and, using Lyrebird tech, it’ll generate a synthetic voice of the person speaking, then add the words you’ve added into the original audio sample.

That means that not only can you create transcripts from audio, you can also generate audio from transcripts!

Transcript:

What did Descript do with Lyrebird? And is it one of the best applications of voice technology yet?

To answer that question, we need to take a long walk to the Whiteboard.

Here.

For those of you that don’t know, Descript acquired Lyrebird in 2019. And what Lyrebird did was enable you to create clones of your voice, but synthetic clones, so, like, you could create a text-to-speech synthetic voice, based on your actual voice!

They were behind some of the things that you might have seen like the clone of Barack Obama, which sounded like this…

“They launched today their website where you can create a digital copy of your voice. They only need you to record one minute of audio. This is just the beginningand they are working hard to improve the results”.

And the clone of Donald Trump, which sounded like this…

“South Korea is finding, as I have told them, that their talk of appeasement with NorthKorea will not work. They only understand one thing”

And so Descript acquired Lyrebird in 2019, but what exactly did they do with the technology? To answer that question, you kind of got a look at what Descript does.

Now, fundamentally, Descript is a transcribing solution.

It takes audio, transcribes it into text and then provides you with both audio and the text. And it’s fantastic for doing things like captioning on videos or transcripts for podcasts or transcripts for meetings or anything like that.

It does get fairly sophisticated and what it does is it will tell you, based on the transcript, you’ll recognize who’s speaking and it will then be able to tell you when different people are speaking, which is pretty cool.

It also lets you edit some of the copy and whatever you edit is reflected in the audio. So if you had a phrase that was ‘VUX World rocks’, which it does, and you took out the word ‘world’ from the transcript, it would then remove that from the audio and vice versa. If you remove it from the audio, it would remove from the transcript, which is pretty cool.

But audio editing like this is subtractive and most of the time it is subtractive because you have an audio file and that’s all you have. You can’t do anything with it. You can put stuff over the top of it and underneath it like if you have a guitar sound you can put a piano over the top and a drum beat underneath, but you can’t change the guitar sound.

You can take clips from one guitar from over here and put it, insert it into the middle of this file, but it’s not going to sound very natural. You can’t generate more sound from one sound file and that’s always been the limitation of Descript is that it only goes one way. It only allows you to remove stuff, but you can probably see where I’m heading with this.

What they did with Lyrebird is they’ve created a new function called Overdub, a new feature, and what that lets you do is add phrases and words into the transcript. So you could change the phrase from ‘VUX World rocks’ to ‘VUX World rocks, subscribe at VUX.World/subscribe and what it will do, using the Lyrebird technology, is it will clone the voice of the person speaking when you add the text in, it will then generate synthetic speech based on that voice and insert it into the audio.

And so the end result is you have an additive audio process as well as a subtractive audio and transcription editing process.

That means that, in the end, you have an audio file full of the things that you’ve removed and the things that you’ve added in, which isn’t going to blend together totally seamlessly.

If you listen to a voice clone versus the real voices, you’re not going to get away with it entirely, but it will certainly work for things like podcasts where it’s a long form audio, like videos, animated videos and even for articles.

You could dictate an article in Descript, edit the article in Descript, have some of the synthetic audio and some of your live audio mixed together. And so the output will be a written article as well as the audio file.

The potential of this is absolutely huge and I’m looking forward to seeing what happens to it because I do think that this is one of the most innovative uses of voice technology that I’ve seen.

Let me know if you would agree, if you’ve got any thoughts on this.I think it’s fantastic.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
resolution	session	This is a functionality cookie used to collect the horizontal value of the visitor screen resolution. It helps in optimizing the website view to the user.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_111445333_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
ajs_anonymous_id	never	This cookie is set by Segment.io to check the number of ew and returning visitors to the website.
CONSENT	16 years 2 months 25 days 18 hours	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__smSessionId	9 hours	No description available.
__smToken	1 year	This cookie is set by the Sumo. This cookie is used for verifying whether the user is logged in or not.
__smVID	1 month	This cookie is set by Sumo. The purpose of the cookie is not yet known.
_mailmunch_visitor_id	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
AnalyticsSyncHistory	1 month	No description
attribution_user_id	1 year	This cookie is set by the provider Typeform. This cookie is used for Typeform usage statistics. It is used in context with the website's pop-up questionnaires and messengering.
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
debug	never	No description available.
intercom-id-or0x2acp	8 months 26 days 1 hour	No description
intercom-session-or0x2acp	7 days	No description
li_gc	2 years	No description
li_sugr	3 months	No description available.
mailmunch_second_pageview	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.

The most innovative use of voice technology so far?

Transcript:

Apple and Google help fight coronavirus

Why Alexa skill discoverability is hard | VUX World