The building blocks of artificial vision

Article By : Martin Danzer, congatec

Preconfigured embedded vision building blocks offer a faster path to custom AI-accelerated systems needed in a growing list of applications.

The eye is the result of an evolutionary masterpiece of nature. From the light-sensitive retina to the information-carrying optic nerve to the analyzing nervous system – natural vision is a highly complex data processing activity that uses low-power neural networking. The intelligent abstraction of what is seen makes it possible for humans and animals to conclude in a fraction of a second what relevance the visible light captured by the eye has for their lives. This masterpiece of natural intelligence took millions of years to evolve. Developers of artificial intelligence (AI) systems need to achieve this feat faster.

Developers of AI-accelerated applications are therefore turning to compact and preconfigured embedded vision kits that combine proven AI hardware and software in an energy-efficient way. Currently, there is particularly high interest in dedicated edge computing solutions. For AI-accelerated systems, this is the neuralgic point for making informed decisions in real time from image information. The detour via cloud-based analysis takes longer and depends on continuous network availability. Whereas at the edge, you’re always at the scene of the action, which makes it possible to autonomously acquire and evaluate visual image data in fractions of a second.

Edge solutions must be robust and reliable

Robust and reliable hardware is an absolute must for such visual edge computing systems, where the data can’t be processed in a protected and air conditioned environment, like it’s possible with cloud computing. Whether deployed outdoors or in the field, traveling onboard a vehicle or sitting on the manufacturing floor, visual edge computing systems must be resilient.

Demand for AI vision comes from visual applications such as ripe fruit detection in agriculture, automated product inspection in manufacturing, access control in building automation, or product recognition in retail shopping carts at the point of sale. Edge-based real-time analysis is superior to human inspection because it works 24 hours a day, seven days a week. The advantages are particularly significant for industrial operations in inhospitable environments. Take monitoring of wind turbines for example, or safety-related video surveillance of production processes. As a McKinsey study finds, AI systems can also increase plant utilization and productivity by up to 20% through predictive maintenance. Visual quality monitoring with automatic defect detection can even yield productivity increases of up to 50%. And for the high safety requirements of autonomous driving, edge-based AI solutions are the be-all and end-all to ensure the reliable and safe transport of goods and people.

Common to all these applications is that they have to find patterns in the provided images, videos or machine data for decision making. What is more, these applications must be able to identify patterns or objects even when they do not match the model 100 percent. A traffic sign for example must be identified correctly whatever angle it is recorded from or if half covered by snow or dirt. This requires digging through massive amounts of data.

Standard CPUs are not made for such tasks as they are optimized for one-at-a-time computing processes with highest mathematical correctness. Another computational approach is thus required. AI needs artificial neural networks that mimic the intuitive way the brain works. This makes identification and decision making much faster compared to having to calculate everything and every point exactly in order to come to a decision.

NPU – the heart of embedded vision systems

A neuromorphic processor or neural processing unit (NPU) is indispensable for providing such computing performance for deep learning and machine learning at the edge. NPUs excel at analyzing images and patterns, making them the central computational unit of AI-accelerated embedded vision systems. Inspired by the architecture of the brain’s neural network, neuromorphic processors are event-driven and only occasionally require power. This means that NPUs consume just a few watts, even for the highest computational and graphic tasks.

NPUs are highly specialized computing cores that are optimized to execute machine learning algorithms. They are designed to process not only highly parallel workloads but also to compute repetitive tasks extremely fast. This is important for a convolutional neural network where data points must undergo folds in the range of several thousands. As an example: In a full HD image, around 2 million pixels have to be processed. This requires crunching many operations per second (OPS), with NPUs needing to achieve performances of several tera operations per second (TOPS) to meet edge computing requirements. But how is this possible when even extremely powerful CPUs cannot deliver this performance? Here, another point of differentiation comes into play: A single AI instruction is not as complex as instruction sets of standard application processors such as x86 or ARM CPUs. Therefore, not as many resources are consumed per calculation step as with a 32 or 64-bit system. Ultimately, though, engineers need both for their application. 

Customized starter set for the edge

This is why processors such as NXP’s i.MX 8M Plus integrate such an NPU alongside four standard Arm Cortex-A53 cores and an Arm Cortex-M7 controller to build an application processor that is fit for machine learning and can efficiently execute AI algorithms. But in vision applications all this only makes sense if images are delivered in real-time and in the necessary quality. So, engineers also need an image signal processor (ISP) that allows pre-processing of images and videos during acquisition. And the better quality this pre-processing delivers the more accurate the post-processing in the NPU. A high-quality ISP is therefore not just interesting for high-performance industrial image processing; it is a boon wherever image processing algorithms can be used to produce better visual results.

The AI-accelerated eye

Another important point is how vision data are received. One channel for vision data communication is MIPI CSI-2.0. If this interface is also pre-integrated in the processor, no additional converter modules are required. This not only simplifies system design but also minimizes the physical footprint. A small size combined with low power consumption and minimal heat dissipation are essential requirements for AI based vision at the edge for applications such as battery powered autonomous vehicles in logistics and farming.

At the same time, applications should also support different communication standards for connecting cameras such as USB 3 or GigE vision, which are common in industrial applications for robots and quality inspection. GigE vision, in particular, allows for longer distances between the camera and the NPU. This is vital for video surveillance applications in buses and trains, for example.

Designed for human-machine interaction

But the scope of applications for NPU-based embedded vision systems goes far beyond people or object recognition. For example, hand gesture and emotion recognition combined with natural language processing take interactive communication applications between humans and machines to a new level. Ultra-short response times and precise localization help to optimize robotic product assembly or warehouse logistics in industrial manufacturing. And equipped with high safety standards, applications can even be found in sensitive areas such as customer service or healthcare.

Building block flexibility

Given the variety of possible embedded vision applications, it is self-evident that the technology platform for this sector must allow the development of custom applications. A one for all solution will not fit. Developers therefore need a set of preconfigured building blocks which they can easily adapt to their individual requirements. These should include not only hardware components but also software support.

On the hardware side, a modular approach with Computer-on-Modules is a widely used and highly efficient design principle. Computer-on-Modules integrate all the required components in an application ready building block: an SoC with integrated NPU, ISP and MIPI CSI-2.0 support such as NXP’s i.MX 8M Plus, RAM, and general controllers for additional interfaces like USB, Ethernet and MMC on one standardized module with scalable performance. This entire building block is plugged onto the application specific carrier board that executes only the required external interfaces and that can easily be designed to fit the required physical footprint and ruggedness.

Customized vision

The design-in of such modules is easiest if they come not only with a standard BSP but also with comprehensive software support for the middleware and the application level. For greatest flexibility, developers for example also need preconfigured tools such as a vision camera SDK that can be used for MIPI CSI-2.0 as well as other industrial camera standards such as USB 3 or GigE. Software solutions should integrate inference engines and libraries such as the Arm Neural Network (NN) and the open-source based TensorFlow Lite to deliver the required AI findings most efficiently. Software development platforms like NXP’s eIQ Machine Learning environment provide developers with such specialized libraries and development tools adapted to NXP microprocessors and microcontrollers.

With its AI starter kit (see Figure 1), congatec made a module with integrated software support available to accelerate the implementation of the NXP processor.

click for full size image

Figure 1: The AI-accelerated embedded vision starter kit from congatec integrates the key components of an AI eye: a Basler dart camera as the eye, a SMARC 2.1 carrier board with 2x MIPI CSI as the optic nerve, and a SMARC 2.1 module as the brain. (Source: congatec)

At the heart of this kit is a credit card-sized SMARC 2.1 Computer-on-Module (COM) (Figure 2). Based on an i.MX 8M Plus processor, this module lets developers bring AI vision to their edge applications quickly and securely.

click for full size image

Figure 2: The brain behind AI vision: The conga-SMX8-Plus SMARC 2.1 module brings neuromorphic intelligence to the edge at just 6 watts TDP. (Source: congatec)

Suitable for industrial use in a temperature range from -40 to +85°C, the module also has a low operating power consumption of just 2 to 6 watts and comes with passive cooling. This makes it a perfect fit for various outdoor and mobile vehicle applications.


The integration of embedded vision is crucial to the success of AI-accelerated systems – whether they are deployed in self-driving vehicles, video surveillance cameras or collaborative robotics. Developers can create custom solutions more easily and quickly by using preconfigured embedded vision building blocks such as congatec’s AI starter kit. Developed in cooperation with Basler, congatec this building block platform combines an NXP i.MX 8 Plus processor with its integrated Neural Processing Unit (NPU).

This article was originally published on Embedded.

Martin Danzer is Director Product Management at congatec. He studied electrical engineering at the Technical University of Deggendorf and has more than 20 years of experience in technical service, development management and product management for Computer-on-Modules, including his time at Kontron and JUMPtec AG.


Leave a comment