Voice interfaces are only going to get more common, and there is a great market opportunity for those vendors that get their product and its approach to privacy correct.
Alexa, can you keep a secret?
We’ve all become used to the convenience of smart speakers like Amazon’s Echo in our home and voice assistants like Siri on our phones, but should we be more worried about their privacy implications? What happens to our data and our conversations once we hand them over to Amazon, Google, or some other mega-corporation?
Voice recognition hits the mainstream
Early attempts at voice recognition have been around since the 1960s, but it wasn’t until the 1990s that it became common, with speech recognition becoming possible on PCs. The really big change in the market came when Apple introduced Siri on the iPhone 4S, which, surprisingly, is as long ago as 2011.
Today, there are voice-controlled “virtual assistants” such as Apple’s Siri and Google’s Assistant on many gadgets and devices. In fact, one in six Americans now own a smart speaker. While Amazon leads the market with Alexa running on its Echo speakers and other devices, there is strong competition from Google, Apple, and others.
Of course, it’s not just smart speakers and mobile gadgets that are starting to adopt voice interfaces. There are also smart displays, which typically include the same voice-assistant interface and AI capabilities as smart speakers — for example, Amazon’s Echo Show. Voice control is also being added to smart TVs and cars and to devices as run of the mill as home thermostats, while Microsoft has made Cortana a central part of Windows-powered PCs.
The technology behind this boom in voice recognition is artificial intelligence (AI) and artificial neural networks (ANNs) running on high-performance cloud servers. There is also a need for some sophisticated signal processing in the local device, such as far-field audio detection, which can pick out a voice from background noise.
We know the technology works, but what happens to our voice once a gadget listens, processes, and responds? A recent survey by Microsoft found that 41% of voice-assistant users have privacy concerns.
Generally, most smart speakers take the audio of your question or command and upload it to their own servers for processing. This means that anything you ask or tell the speaker is stored, at least temporarily, by the company providing the service.
While the smart speaker is always listening, it will upload audio only when it hears its “wake word,” such as “Alexa.” But there are suggestions that, in practice, the speaker might mis-hear you say something else and start recording and uploading when you don’t mean for it to.
Another area of concern is how much access employees from companies providing the service can get to our voice recordings and data. Recent press reports claim that Amazon has teams of people listening to the recordings to improve quality, which seems logical but which also introduces some theoretical risks. On the other hand, we’re all used to our web-browsing history and chat messages being stored by big companies, so is adding voice really any different?
There is a suggestion that these employees might be able to link voice data back to other personal details, including location. Knowing where you live is useful to give localized recommendations, like the best restaurant in your city, but this does start to raise warning flags regarding privacy issues.
One way to overcome these privacy issues is to combine a mix of cloud processing with more on-device AI processing. This can be achieved by using specialized processors that are capable of understanding commands and responding accordingly.
For many uses, such as controlling smart home features, interactions could then be completed with no communication at all with the cloud. Where information is required to be exchanged — when the user has asked a question or has requested to control a web application such as Spotify or an internet radio station, for example — then perhaps the smart speaker can just send meta information to the cloud rather than the actual conversation itself.
As voice activation and control is added to many more consumer products, the spread of neural-network processors that are efficient enough to be deployed in embedded applications and can be used to handle computation locally, without the need to send data to the cloud, will certainly help to assuage many people’s privacy concerns.
While the cloud companies would prefer the raw conversations for their datasets, the individual is much more likely to trust voice services if the device in the home acts as a type of security gateway, limiting what information is shared with the cloud itself.
Transparency here is vital: Device manufacturers will need to be clear to consumers about what processing is handled locally and what data is sent to the cloud. Of course, there is a positive angle to this debate — it’s not difficult to imagine a device vendor using its local processing capabilities as a differentiator in its promotion and persuading consumers that a smart speaker with more “local only” functions is desirable.
What can OEMs do?
Another way to improve public confidence is to build in “privacy by design” features so the user can see that the device is aimed at keeping their information secret. For example, Amazon’s Echo Show 5 smart display has a physical shutter that covers its camera. All smart-speaker devices also have a mute button that disables the microphone.
Whatever the hardware capabilities, the software must deliver a user experience that gives us confidence, regardless of our level of technical ability or experience. An essential step is to provide clearly explained ways for users to set their privacy preferences, along with the ability to delete all voice recordings and history whenever they want.
Another step toward ensuring privacy is to make sure the “wake word” detection (which is performed locally) is as accurate as possible so consumers feel confident that they can talk privately in the presence of a voice interface. More powerful processors and sophisticated algorithms are becoming common in voice-controlled gadgets, enabling smart speakers to reliably determine when their help is being requested.
Voice interfaces are only going to get more common, and there is a great market opportunity for those vendors that get their product and its approach to privacy correct. Consumers want convenience, and the evidence from other technology products would suggest that once vendors have demonstrated that privacy is being addressed, they will be happy to talk to voice-controlled gadgets just as freely as they would type on their smartphone.
For the record, I just asked my Alexa-powered Echo speaker: “Can you keep a secret?” and got the reply: “Tell me anything you’re comfortable with me knowing.” It’s a little cryptic, but it’s a good starting point.