This article takes a closer look at some of the video, audio, and power design challenges associated with video doorbells and the technological advancements needed to solve them.
Useful Internet of Things (IoT) applications have been discovered in nearly every industry vertical segment, and are effectively augmenting the utility of legacy systems. For example, residential, commercial, and industrial facilities are leveraging video doorbells for security purposes. These services have been available for decades, but were usually limited to high-end facilities that could afford the costly two-way audio and one-way video capabilities with a closed-circuit television network. Now however, IoT technology enables this level of security without an extensive coaxial or Ethernet infrastructure. This article takes a closer look at some of the video, audio, and power design challenges associated with video doorbells and the technological advancements needed to solve them.
A seamless user experience
Legacy video doorbell systems involved the use of a ring button, microphone, and video camera. These systems were often hardwired to a power source, with video routed to a specific television set. An IoT-enabled video doorbell has a similar purpose, but accomplishes it quite differently. A motion sensor detects a visitor walking up to the doorway and enables video streaming over the cloud to a smartphone app. Communication with visitors occurs over a two-way IP audio stream and a one-way video stream that runs through the app. The basic functions of these doorbells can be integrated with complete security systems that can remotely enable/disable a keyless lock, set off an alarm, or provide automated feedback based on specific input.
Early releases of video doorbells were often plagued with video and audio issues such as false chimes and incoherent audio, but key features such as cloud backups, motion detection, video streaming, and two-way communications require seamless functionality to be useful. These demands, combined with the previous hardwired power constraints, create their own set of hardware challenges for modern video doorbell subsystems.
False motion events
The pyroelectric, aka passive infrared (PIR), motion sensors commonly used in video doorbells are prone to errors, such as falsely reacting to glare from passing vehicles in the daytime, gusts of warm air, bugs, animals, and a wide array of other heat-based activity and, in doing so, trigger annoying false alert tones and notifications on the user’s phone. This drastically degrades the security utility of a video doorbell, as the user will eventually ignore alerts altogether or even take their doorbell offline. In addition, the PIR sensors’ frequent false motion detection events can dramatically shorten battery life.
One relatively straightforward solution is to use two PIR sensors aimed to have slightly overlapping coverage to create a much larger area for motion detection (Figure 1). Because the double sensors only generate notifications for larger objects, smaller things such as bugs and pets won’t register. Using the PIR sensors in tandem with other light and temperature/humidity sensors avoids false triggering caused by rapid changes in temperature or light. This multimodal sensing approach lessens the chance of false alerts while also consuming minimal power, thus extending battery life.
It’s also possible to use an embedded MCU and some firmware to implement algorithm-based motion detection for more accuracy. There are many ways to implement vision motion-based detection, but one of the most common is to compare a current frame with a reference image and track the differences pixel by pixel. This type of image processing has to be smart enough to treat motion from passing vehicles and trees in the wind as part of the background in order to avoid generating false positives, a capability which requires a considerable level of processing power.
Some of those filtering tasks could be offloaded to cloud-based algorithms that can fine-tune image data specific to the customer. But this requires a relatively large infrastructure for support and good Wi-Fi connectivity, and still results in high power consumption; thus, battery-powered smart doorbells are not an option – at least for now. While relying on an external power source decreases the doorbell’s location options, it also gives users the upside of never having to charge or replace batteries.
Image sensor and processor interfacing issues
The image processing in video doorbells requires an image sensor, a digital media processor and, in most cases, a few peripheral devices. There are several things to consider when selecting an image sensor, the most important of which are resolution, frame rate, pixel size, pixel architecture, and shutter time. Along with the many considerations for individual components, there are often interfacing issues between the image sensor and the digital media processor.
Unless careful attention is paid, you may find yourself with pair of excellent devices that cannot communicate with each other, due to mismatched formats of their input/output (I/O) interfaces. Mistakes like this are easier to make than one would think since there is a great deal of variance in I/O interfaces (I2C, parallel, general-purpose I/O). To avoid this unpleasant scenario, designers must ensure that the I/O interfaces supported by the image sensor are compatible with the digital media processor’s I/O.
Similar problems can arise when two devices have different operating voltages and logic signal levels. Fortunately, voltage translation devices readily address this mismatch with bidirectional voltage translations that can range from 0.6 to 5.5 V. Although they add a small cost to the product’s BOM, voltage translation devices repay the investment by giving designers a much wider range of options for image sensors and MCUs than if they could only use ones with identical voltage levels.
The full-duplex, hands-free communication required for modern video doorbells adds other complexities, requiring designs that must deal with unstable feedback caused by users adjusting the speaker/microphone gains too high. For example, a person receiving audio requires a relatively large gain on the speaker in order to adequately discern what is being said at the far end, but the microphone’s close proximity readily detects the voice and often amplifies it back, causing an undesirable echo (Figure 2). In the past, half-duplex communications mitigated this echo by significantly reducing the gain of the microphone when receiving a signal through the speaker
A system that actively adjusts microphone and speaker gains may correct this issue for full-duplex communications in environments with relatively low-ambient-noise levels. Unfortunately, this isn’t as effective in environments with unpredictable ambient noise sources, such as a passing bus or other traffic. There are several digital signal processing (DSP) techniques that can solve this problem, including acoustic echo cancellation (AEC) and adaptive spectral noise reduction (ASNR). AEC creates adaptive filters that effectively cancel echo by originally recognizing a transmitted signal and eliminating the signal if it reappears within a certain time window. ASNR leverages the frequency domain to remove ambient and unwanted noise components from an audio signal, thereby removing background and broadband noise. AGC is designed to improve lower-level speech signals for hands-free communications. Audio algorithms such as these deliver a superior audio experience by maintaining microphone and speaker gains without unwanted feedback and echoes or resorting to voice switching.
Maximizing the use of the speaker
While complex DSP algorithms help enable full-duplex audio communications, they often do not maximize the full capabilities of the system’s audio speaker. Since excessive heat in the speaker’s voice coil and exceeding its excursion limits can cause rapid damage and a blown cone, audio engineers usually impose hard limits on amplification levels that are far below the speaker’s actual capabilities. Software algorithms used in tandem with amplifiers can monitor the speaker’s temperature and excursion in real time. This feedback enables a more fine-tuned sound pressure level and greater audio clarity.
Voice command and speech recognition
Future generations of video doorbells will probably feature hands-free control, based on voice activation and speech recognition technologies. Once again, these voice user interfaces add another layer of complexity as they receive their commands from an array of microphones and DSP algorithms. These doorbells will most likely use beamforming algorithms to separate the desired audio signal from the background noise, despite the relatively large distance from the receiving microphones. There are microphone boards already available that implement beamforming algorithms that amplify the speech signal from the direction of the speaker to obtain clear speech and audio from noisy environments.
In a truly practical video doorbell product, it is important that these advanced functions do not require an additional power supply and can function on the native microphone input signals. What we are looking for is a design strategy that results in a simpler, low-power, small-form-factor product.
Power budgeting challenges
Practical video doorbells can be powered in one of the following ways: use a rechargeable battery, allow it to draw power from the home’s existing low-voltage doorbell wiring, or equip it with a power-over-Ethernet (PoE) interface. There are pros and cons to each of these power options (Table 1). As stated earlier, the flexibility of placement afforded by a battery-operated unit enables a simpler installation, while hardwired doorbells have the benefit of minimal maintenance.
|Rechargeable battery||Flexible doorbell placement, ease of installation||Relies on over-the-air (OTA) connectivity|
|Connected to previous doorbell wiring||No need to change batteries||Relies on OTA connectivity, may need to upgrade transformer|
|Connected through PoE||Hardwired connection, PoE||Must make a hole in the wall for the housing and route Ethernet through the home to the exterior|
Power savings are a major concern for battery-powered video doorbells. Many of the aforementioned algorithms will require more power-intensive processing. Highly specific SoC designs, such as the Texas Instruments CC3120/CC3220, enable a higher-level parallel processing (wake-up/sleep triggers, network connection) with fewer off-chip transactions (on-chip RAM and/or flash), which result in lower overall power consumption. In addition, MCUs designed for battery operation have multiple power modes, including shutdown, hibernate, sleep, standby, and active, that a careful developer can use to further reduce energy consumption.
A major consideration for any product designed to use a home’s existing doorbell power sources is that there is no standard output voltage for these in AC supplies, which were originally created to power the chime with anything between 8 V and 24 VAC. In order to minimize potential performance degradation in products powered this way, it’s important to pay careful attention to parameters such as output voltage accuracy, voltage ripple, system efficiency at full load, and thermal dissipation. This is especially true for particularly sensitive components such as the complementary metal-oxide semiconductor (CMOS) imaging sensors often used in video doorbells. These components are particularly sensitive to noise sources such as fluctuations in the power supply, electromagnetic interference, and temperature variations.
To deliver its best performance, a video doorbell needs a power supply that can accept a wide range of low-voltage AC and produce clean, well-regulated DC for its various subsystems (sensors, I/Os, audio, memory, UI, etc.) that can also fit within the product’s compact enclosure. As Figure 3 shows, this typically involves several buck converters, preferably ones that employ a synchronous architecture that delivers high efficiency at heavy loads. In designs such as this, which require a wide range of voltages, or a large number of discrete supplies, a single buck regulator can be used to feed several linear regulators (ideally low dropout).
System efficiency at both full and light loads is necessary for battery-powered applications, but also for line-powered products operating in tightly-packed enclosures that have little or no ventilation. For a video doorbell, features such as the user interface, wireless communications monitoring, and motion detection must be carefully implemented to maximize power efficiency. Similar attention must be paid to standby currents, such as the power supply’s quiescent currents and shutdown currents, since they have a significant impact on battery life. A low quiescent current can drastically lengthen the lifetime of the battery, as a video doorbell spends most of its time in sleep/hibernate modes. Moreover, a synchronous converter with the ability to make a seamless transition from its pulse-width modulation mode to its power-save modes allows it to remain relatively efficient at both full and light loads.
Video doorbells are one of several types of IoT products that have severe size constraints (and sometimes power constraints), and must balance the increasing sophistication of processor-hungry algorithms against limited power resources. These constraints result in some unique design challenges that technical advances now make it possible to overcome. Naturally, these challenges will continue to grow in complexity as artificial intelligence, in the form of speech, sound, and facial recognition, becomes a must-have feature for residential security systems.
Srinivasan Iyer is a system engineer for Texas Instruments’ building automation group, focusing on video surveillance, HVAC, elevators, and escalator trends.