What makes voice command essential?

Article By : Steve Taranovich

Voice command in real-world environment requires ASR and NLP that entail audio analysis, noise and voice distinction to be functional.

Voice command has become a feature of consideration for electronics consumers. It has been incorporated in many users’ everyday activity. Unfortunately, ambient noise has been a concurrent consideration.

Consumers want to speak commands to their automobiles, mobile devices and wearables, but ambient noise can get those messages wrong. Automatic speech recognition (ASR) and natural language processing (NLP) for systems like Siri, Google Now, Alexa and Cortana, work pretty well in a quiet home, but our real-world environment surrounds us with a great deal of noise.

I have been seeing some innovative development in microphone and speech recognition. Most system designs developed to mitigate ambient noise will analyse sound—to try to enhance the speech and suppress the noise. Seems reasonable. But this technique distorts the voice signal; this is a “physics” approach which suppresses noise signals and boosts voice signals–but this inherently introduces distortions that speech engines cannot process. Let’s take a look at where most people use their phones and wearables.

[voice-noise levels fig1 (cr)]
Figure 1: A Kopin image shows where people use smartphones and/or wearable devices and the corresponding noise levels in each area.

Talking to a machine

Why do we want to use voice commands? @KCPB, a Venture capital firm tells us (Figure 2).

[voice app use fig2 (cr)]
Figure 2: Statistics show that the primary reason why consumers want to use voice is hands- and vision-free tasks at home, on car or just on-the-go.

Speech recognition is presently at about 95% accuracy, but experts, like Andrew Ng, chief scientist at Baidu, said that going to 99% would be a game changer. Accuracy is a primary goal and the secondary goal is latency (Who wants to wait 10s to get a response from your system?) for explosive use of voice recognition by consumers. In 1970, Machine speech recognition was only 10s of words. Fast-forward to 2016 and 7 to 8 million words were recognisable with 90% accuracy in a low noise environment, according to Google.

Voice command’s functions and advantages are more than just perks, they are undeniably essential, especially for the occupied hands and/or vision. So it takes a lot of AI here for electronics consumers using voice command capability to meet a good [machine] listener.

« Next: Voice command ‘solution’ in a whisper

Leave a comment