Voice command ‘solution’ in a whisper

Article By : Steve Taranovich

The Whisper Voice IC, which boasts voice extraction filter sans voice signal distortion, detects voice up to 5m away from the microphone.

« Previously: What makes voice command essential?

Kopin’s Whisper Voice Interface IC takes a different approach to noise with AI. They sample the acoustic environment 16,000 times per second, perform a dynamic analysis of noise and voice activity, and use their Voice Extraction Filter to “extract” voice without distortion once the parameters are tuned for the device and the application.

[Kopin voice recognition fig1 (cr)]
Figure 1: This Kopin image demonstrates voice recognition accuracy in a real world noise environment The chart compares the performance of smart glasses with the Whisper chip against the ASR and noise cancellation technologies found in two popular devices: a leading Bluetooth earphone and a leading smartphone.

As can be seen in Figure 1, the Whisper chip’s performance remains consistent as noise levels increase, the earphone’s performance begins degrading at 75dB (the amount of noise associated with a car interior or dishwasher) while the smartphone’s ASR performance starts to drop at approximately 85dB (or the amount of noise associated with restaurant).

Human-machine conversation

I really like this Whisper Voice IC, which is one of the best ways I have seen for voice recognition applications so far in the industry. The Voice Extraction technique lends to just about zero distortion to the voice signal. The adaptive voice detection architecture allows for “listening” that adjusts to the environmental noise level. The Whisper Voice Chip can be tuned for mid- and far-field applications (up to 5m distance from microphone).

Some of the audio processing capabilities of this solution are:

  • Microphone balancing
  • Beam forming
  • Noise cancellation
  • Voice activity detection
  • 16kHz sampling rate
  • 48kHz PWM

The system separates speech from the noise without creating non-linear distortion in the voice signal. The most common cause of poor voice quality from a device and of ASR failures is non-linear distortion, whether using speech recognition software on the host device or cloud processing. The IC needs less than 10mW in operation that is important to the life of battery-operated portables and wearables.

Since this architecture is an all-digital solution, it can replace the codec–there is no ADC or DAC needed. Another nice advantage is that the chip provides efficient front-end audio processing, enabling less processing and power demand on the device’s host processor.

[Kopin whisper chip diagram fig2 (cr)]
Figure 2: The Whisper Chip is situated between the microphones and speech engine. It is easy to implement and works well with the leading operating systems, processors and speech recognition engines.

Voice assistant at-large

Digital microphone inputs can handle up to four microphones and there are two digital speaker outputs. The chip’s compact size of only 4mm × 4mm keeps the board footprint low especially in portables and wearables.

Since voice seems to be the most natural way for humans to communicate with the ‘machine,’ I give my endorsement for this solution which helps make voice recognition a reality and a practical solution (because of excellent performance, low power, and low cost which will bring consumers to use more portables and wearables coupled with a good microphone design). My vote in the microphone arena is Vesper.

First published by EDN.

« Previously: What makes voice command essential?

Leave a comment