Audio front-end and wake-word detection: Key considerations in reference designs for Alexa

Article By : Majeed Ahmad

Here is what developers need to vet while selecting hardware and software development kits for designs built around Alexa Voice Service.

Design engineers can employ hardware modules and software services to integrate Alexa Voice Service (AVS) into smart home, automotive and wearable devices; that brings the cloud-based Alexa experiences to products ranging from portable speakers to smart appliances to in-vehicle infotainment. Developers can integrate Alexa into their voice-based products and write a companion app to turn a product into a connected one.

Since Amazon brought the original Alexa device to market in 2014, Amazon Web Services (AWS) as well as several chipmakers have unveiled reference designs that help engineers integrate Amazon’s voice recognition technology and AVS interface by providing pre-built and pre-tested designs.

So, if your company doesn’t have a lot of engineers to work on hardware and software development, reference designs ease the development of simple and cost-effective natural-language understanding and voice interface for Alexa-based designs. Otherwise, the integration of high-quality audio processing makes the development of voice-enabled devices lengthy and complicated.

photo of a reference design for AVS-based voice applicationsFigure 1 Reference designs for AVS-based voice applications are built to seamlessly integrate Amazon’s voice recognition technology into voice-controlled devices. Source: STMicroelectronics

Wake word detection

It all begins with a robust wake-word engine (WWE) that listens for the keyword “Alexa” before the system takes any action. There is a cloud-based wake word verification that validates the context and makes sure that the user really wants Alexa. Here, the voice capture part of the reference design enhances “Alexa” wake-word detection by performing audio capture in real-world conditions. So that designers can interrupt noisy environments even from moderate distances.

Take the example of Cirrus Logic’s Voice Capture Development Kit for Amazon AVS applications; it provides acoustic tuning with proven hardware and software components. The kit improves “Alexa” wake-word detection in both quiet and noisy environments even with the user several meters away from the device. It does that by suppressing noise and other real-world interference for more accurate and reliable voice interactions.

annotated photo of the far-field AVS reference designFigure 2 The far-field AVS reference design is aimed at smart speakers and other voice-controlled smart home devices. Source: Cirrus Logic

As shown above, the kit includes a voice capture board with the two-microphone array, Raspberry Pi 3 (RPi3), speaker, and a microSD card preloaded with required firmware for instant productivity. A control console simplifies operation of the various RPi3 applications and provides a user-friendly interface to perform acoustic tuning and diagnostic functions.

The voice capture board features Cirrus Logic’s CS47L24 smart codec, CS7250B digital MEMS microphones and SoundClear algorithms for voice control, noise suppression and echo cancellation. Here, smart codec integrates hi-fi DACs, a stereo headphone amp, and a mono speaker amp to reduce board real estate and bill-of-materials (BOM).

Next, the MEMS microphones featuring ultra-low noise floor and wide dynamic range of 103 dB ensure precise voice capture in challenging noise conditions. Finally, the SoundClear algorithms block noise that would otherwise interfere with the Alexa wake word.

That allows the kit to efficiently perform “Alexa” wake-word detection and audio capture in real-world conditions, even from moderate distances in noisy environments, enabling users to reliably interrupt loud music or Alexa response playback.

Audio front-end

The basic hardware in an AVS-based design comprises multiple microphones and an audio front end (AFE) that ensures the “Alexa” wake word detection in both quiet and noisy environments. That makes the audio front end a critical building block of any AVS reference design.

Audio front end picks up the user’s voice, amplifies it, reduces background noise, and sends it to the cloud. It’s difficult to do, and thus, using a development kit is a great way to create an audio front end.

Take the case of TalkTo, DSP Concepts’ audio front-end with AVS qualified integrated voice processing; it has been launched for STMicroelectronics’ AWS IoT Core reference design based on the chipmaker’s STM32 MCUs. The TalkTo audio front-end features noise reduction, echo cancellation, and signal processing based on advanced beamforming for far-field audio detection. It’s delivered through Audio Weaver, a free tool that helps developers finetune AVS designs.

block diagram of a single-chip voice solutionFigure 3 A single-chip solution comprising audio front-end processing, local wake-word detection, communication interfaces, and memory content including RAM and flash reduces BOM costs and simplifies layout. Source: STMicroelectronics

ST’s 36×65-mm board combines a Wi-Fi module with an STM32H743 MCU that integrates audio front-end processing, local wake-word detection, communication interfaces, and memory in a single chip. The reference design hardware also includes an audio daughterboard as a separate module to further simplify development and prototyping.

The daughterboard comprises FDA903D audio codec, user LEDs and buttons, and two MP23DB01HP MEMS microphones spaced at 36 mm for size-constrained designs. That also enables developers to incorporate a privacy mode that switches off the microphones, so a red LED can inform the users that Alexa can’t hear voice commands.

Far-field voice recognition

Other chipmakers have also chipped in with reference designs that integrate Amazon’s far-field voice recognition technology. NXP, for instance, has unveiled a reference platform that claims to recognize a user’s request from across the room even when loud music is playing.

NXP’s reference platform for Amazon Alexa comprises a 7-microphone array design, audio processing algorithms, and beamforming technology. It integrates Amazon’s far-field voice recognition technology with NXP’s i.MX application processors while aiming to simplify the creation of voice-controlled devices.

The voice-enabled designs like Alexa transform the way users interact with smart things ranging from toasters to cookers and thermostats to blinds. Here, reference boards and voice-capture kits provide the fastest route to market for various Alexa-enabled products while ensuring highly accurate wake word triggering and command interpretation even in noisy environments.

We are at the very beginning of the voice-enabled device revolution, and the diversity of these applications means that pre-designed and pre-tested reference boards and kits will likely play an important role. Their role will be crucial in bringing the voice-enabled products quicker to market and bypassing their design complexity.

This article was originally published on EDN.

Majeed Ahmad, Editor-in-Chief of EDN, has covered the electronics design industry for more than two decades.

Related articles:

Leave a comment