The AI privacy barrier: will Sonos be the first to hurdle it?

Kane Simms
November 25, 2021
in Article, News, Opinion

The AI privacy barrier: will Sonos be the first to hurdle it? https://vux.world/wp-content/uploads/Sonos.png 1600 1200 Kane Simms Kane Simms https://secure.gravatar.com/avatar/26839585565b6484d0560f5e365378f0?s=96&d=blank&r=g November 25, 2021 November 25, 2021

A code leak confirms Sonos voice assistant is coming. Will this be the first voice assistant with true privacy at the heart?

Image source: Variety.com

Evidence of a Sonos voice assistant

A Reddit user found some hidden code in the Sonos app that suggests a voice assistant, named SVC (Likely Sonos Voice Control), is on its way.

One of the things that the Sonos voice assistant is likely to differentiate itself on is privacy and security, given its underlying technology (which we’ll come to). This could mean that Sonos becomes the first genuine contender to rival Google and Amazon in the smart speaker space.

The importance of privacy for voice assistant users

In a recent study conducted by Vixen Labs, it found that 52% of people in the US, UK and Germany are concerned about privacy and security when it comes to using voice assistants.

Even people who use their voice assistants every day are in some way concerned with privacy and security. It’s one of the hottest topics and biggest barriers to consumer adoption of voice user interfaces. That, and accuracy (“It never understands me”).

Building trust is just part of the natural technology adoption cycle

This is, in part at least, the nature of emerging and fledgling technology adoption. We all remember the days when we’d book a restaurant table online or buy something via an app, then call the company on the phone to make sure it’d all gone through! This apprehension and uncertainty around wider usage of voice assistants is the same.

Privacy and security are two of the main reasons some people still don’t bank online. My in-laws only started grocery shopping online because the pandemic made the perceived risk of shopping online less than physically entering a COVID-riddled shop.

So privacy and security concerns with changing behaviour aren’t new. So why is voice different?

Why privacy concerns exist with voice user interfaces

Well, firstly, it’s a totally different and new type of interface.

When you can’t see anything, it makes the whole interface difficult to grasp. Yes, it’s potentially easier to speak than it is to tap or swipe. And, yes, we’re used to speaking, so it should be easier to adopt, but that’s not strictly the case.

The historic confidence of screens

Just because you can talk to something, doesn’t mean that it’s easy to use. There are a whole bunch of things that a screen gives you that voice doesn’t.

You get a big green tick when something has been successful. You get a big red cross when something goes wrong. Even a page reloading can help users orient themselves in an experience.

Plus, people have been using screens for years and are now accustomed to the mental model and interface standards.

Buttons can be clicked. Boxes can be typed into. Some words can be clicked in some sentences. We know that, and so we understand it.

We also know what’ll happen when we perform certain actions. If we click a link, it’ll open another page. If we click a button, something will happen. We’ve performed those actions so often that we’re completely confident in the cause and effect of our interactions.

For new users, voice doesn’t always come naturally

With voice, all you have is sound. Sound is temporary and easy to miss. Was that a successful payment made or did it bounce? What did my assistant ask me again?

And if you do mishear or miss something, what do you do? What’s the verbal equivalent of reloading a page, scrolling back to the top or clicking the ‘back’ button?

Then there’s the lack of understanding about what’s actually possible to do in a given interaction. If I say ‘repeat’, will it actually repeat? Can I say anything to this thing or is it listening for specific ‘key words’ or ‘trigger phrases’? And what are those key words?

All this is to say that, for new users of voice interfaces, it doesn’t come naturally all the time. This leads to some uncertainty. And it’s hard to place your trust in something that’s so uncertain.

Uncertainty over an interface type decreases trust in that interface

Then, when you’re asked a question in a survey about privacy and security; these are principally trust questions. Do you trust that your data is being held securely? Do you trust that your conversations are kept private?

If you’re uncertain about how a certain device or technology works, you inevitably lack confidence in using it. And if you’re full of uncertainty and lack confidence, then you’ll not trust it. If you don’t trust it, you’ll obviously have concerns about whether your data is kept secure or private by proxy.

So part of the privacy and security issue is a lack of confidence and understanding of the interface itself.

Compounding the uncertainty

And it’s not just the interface type that’s uncertain for new users. It’s also the lack of understanding about the technology and what’s happening behind the scenes. Left to the inexperienced imagination, you can conjure up all kinds of possibilities. Some people genuinely believe that Jeff Bezos is sitting on his yacht listening to Alexa recordings.

And because you don’t know what’s happening behind the scenes, it’s easy to react poorly when you learn that contractors in the Philippines have been listening to some of the things you said to your phone; your most personal device.

Within the conversational AI, speech technology and NLP industry, it came as no surprise that folks were listening to recordings. It’s the only way you can train an ML system to improve. Review the ASR and NLU performance and make improvements, so that next time someone asks ‘how old is Jeremy Kyle?’, the system can answer.

Building trust will take time

So it’s clear that building trust will take time as we collectively build new mental models around interacting with voice assistants and voice user interfaces; as the industry learns and eventually settles on interactions patterns, and as the public’s collective knowledge on the technology and how it works increases.

Short term privacy solutions

One thing that can be done in the short terms to help people build a little more trust and confidence in using their voice as an interface is to make sure that speech data isn’t sent to the cloud for processing. That means that none of your spoken audio travels over the internet and risks being intercepted. It means that none of that audio is stored on a server somewhere for contractors to listen to. And it means that you can have confidence in the fact that everything you say to your device stays on your device.

Google and Apple have already made some moves to make this happen. Google announced at I/O ’20 that a large degree of its speech recognition models can be run on device on Pixel phones using its VoiceFilter-Lite system. Apple announced the same this summer, but restricted to devices using the A12 bionic chip and running iOS 15.

This means that you can speak and dictate to your phone, or use the voice assistant, and none of that data is sent to the cloud for processing.

So it’s happening, but it’s all very new to Google and Apple, and there’s no sign of it from Amazon, the market-leading smart speaker provider.

Making on-device speech recognition and natural language processing work

One company that specialised in on-device speech recognition and natural language understanding was Snips. A Paris-based organisation that prided itself on its patented ‘edge computing’.

It’s ‘edge’ computing models meant that you could, for example, ask your coffee machine to make you a coffee, and that request would be fulfilled without an internet connection.

Snips was acquired in November 2019 for $37.5m by none other than, Sonos.

Sonos voice assistant and privacy by design

So we know now that Sonos is working on a voice assistant, we know that the code exists in its app, and we know that the underlying technology is Snips; privacy first.

Therefore, if there’s one bet you could place, it’s that the Sonos voice assistant will run the vast majority of its processing on-device.

This would make it a viable alternative to Amazon’s Echo and Google’s Home smart speakers, and would place the Sonos smart speaker in real competition.

Why Sonos can seriously compete with Amazon, Google and Apple

Music has consistently been the top use case for smart speakers since the launch of the Amazon Echo 7 years ago. And we know that privacy is a top concern for users. A smart speaker, that sounds as good as Sonos (a well trusted and respected audio brand) that also has privacy and security at the absolute centre, with on-device processing is a no brainer for some.

Smaller language models mean complete on-device capabilities

While Amazon and Google will get to entirely on-device processing eventually, it’ll take them a lot longer. They have extremely wide and broad speech and language models. Alexa needs to transcribe all kinds of requests and accurately classify them. Everything from timers to music to calendars to news to skills to recipes to questions, you name it. Alexa, Google and Apple have to use the cloud to process this in many cases because they need the computing power to do so.

Sonos only needs to be able to understand music requests. And although that’s not a small task – Google found that there are over 5,000 ways to say “set an alarm” – nevertheless, it’s a much easier task than to understand everything.

Sonos is in a strong position

That means that Sonos, with its music-specific language model will only need to send specific song titles, artist names, playlist names etc, to the cloud in order to play the right song. And it’ll only need an internet connection for the music streaming element, and to push language model updates for things like new song or album titles, artist names and playlists.

Up until now, privacy focused voice processing has been reserved for specific devices produced by the largest companies in the world. Now, there’s a true contender. And it could help increase confidence and build trust in voice user interfaces in general, which would be good for all of us.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
resolution	session	This is a functionality cookie used to collect the horizontal value of the visitor screen resolution. It helps in optimizing the website view to the user.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_111445333_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
ajs_anonymous_id	never	This cookie is set by Segment.io to check the number of ew and returning visitors to the website.
CONSENT	16 years 2 months 25 days 18 hours	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__smSessionId	9 hours	No description available.
__smToken	1 year	This cookie is set by the Sumo. This cookie is used for verifying whether the user is logged in or not.
__smVID	1 month	This cookie is set by Sumo. The purpose of the cookie is not yet known.
_mailmunch_visitor_id	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
AnalyticsSyncHistory	1 month	No description
attribution_user_id	1 year	This cookie is set by the provider Typeform. This cookie is used for Typeform usage statistics. It is used in context with the website's pop-up questionnaires and messengering.
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
debug	never	No description available.
intercom-id-or0x2acp	8 months 26 days 1 hour	No description
intercom-session-or0x2acp	7 days	No description
li_gc	2 years	No description
li_sugr	3 months	No description available.
mailmunch_second_pageview	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.