By training machine learning algorithms, researchers from Swiss research centre CSEM have devised a low-power, real-time face detection and recognition camera system measuring only a few cubic centimetres.

The Vision-In-Package (VIP) system packs a camera system with a low-power processor (ARM Cortex M4/M7 with 8MB RAM), a high-dynamic range imager, optics and a communication interface. The system occupies only around 4cm3, weighs less than 20g including a battery cell and features a complete facial analysis pipeline running in real time and fully embedded within the VIP system.

The software is compact and stand-alone with no external dependencies. It is comprised of a minimal version of the uKOS operating system and a face analysis package running on it. Unlike existing systems that run on powerful hardware architectures, the VIP system requires several orders of magnitude less CPU time and memory and the analysis pipeline runs at around 4-5 frames per second at QVGA resolution.

First, all the faces in an acquired frame are detected, which typically takes less than a hundred ms to run and requires only a few hundred KB of RAM memory. Then facial attributes, such as corners of the eyes and nose, are located within each detected face region and the face undergoes a normalisation step, which is a rough geometric transformation that aligns the eyes horizontally and scales the face to a standard size, together with a photometric normalisation that re-moves non-linear intensity variations caused by shadows and non-uniform illumination.

Then actual face recognition takes place, extracting descriptive features at landmark locations to uniquely identifying people in a database of registered faces. New individuals can be registered to this database instantly at any time with just a single click and without requiring any re-training.

To achieve this, the researchers used efficient machine learning algorithms including the Adaboost, ensemble of regression trees and LBP algorithms, which they trained on millions of examples with ground truth annotations. The resulting classifiers typically take a few hundred kilobytes of space and are fast to run even on low-end mobile processors, according to the team.

The standalone unit could find use in wearables, marketing and advertisement analytics for collecting viewership and demographics data, robotics for more personalised interactions, but also among TV manufacturers, the automotive industry to monitoring driver drowsiness and distraction or for automated settings adjustments, as well as the pervasive security cameras.

First published by EENews Europe.