Integrating a niche memory technology like ReRAM can give an AI chip a multidimensional boost.
There are two basic phases of machine learning: training and inference. An artificial neural network, designed to mimic how the brain works, is first exposed to a large amount of known data – pictures of dogs and cats, for example – so it can learn to recognize what each look like and how they are different. This trained neural network, or trained model, is then put to work using what it has learned to infer things about new data it is presented with, in this case determining if an image is of a dog or a cat.
Today, most training occurs in data centers, with some on the edge. Large companies like Google, Facebook, Amazon, Apple, and Microsoft have massive amounts of consumer data they can feed their server farms to perform industrial-scale training for AI and improve their algorithms. The training phase requires very fast processors, such as GPUs or Google Tensor Processing Units.
Inference occurs when data is collected by an edge device – a photo of a building or a face, for example – then sent to an inference engine for classification. Cloud-based AI, with its inherent delay, would be unacceptable for many applications. A self-driving car that needs to make real-time decisions about objects it sees is not feasible with a cloud-based AI architecture.
As AI capabilities move to the edge, they will drive more AI applications, and increasingly these applications will require ever more powerful analysis and intelligence to allow systems to make operational decisions locally, whether partly, or fully autonomously, such as in self-driving cars.
Traditional CPUs are not very good at these tasks, and high-end GPUs consume a lot of energy and are expensive. Inference at the edge demands more affordable, lower-power chips that can quickly traverse the neural network to recognize an animal, identify a face, pinpoint a tumor, or translate German to English.
Today, more than 30 companies are developing dedicated AI hardware to achieve the greater efficiencies required for these specialized computing tasks in smartphones, tablets, and other edge devices.
Analysts have predicted the global AI chip market will grow at a compound annual growth rate of about 54% between 2017 and 2021. The need for powerful hardware that can handle the demands of machine learning is a key driver to this growth.
Removing the memory bottleneck
All AI processors rely upon data sets, which represent models of the “learned” object classes (images, voices, etc.), to perform their recognition feats. Each object recognition and classification requires multiple memory accesses. The biggest challenge facing engineers today is overcoming memory speed and power bottlenecks in current architectures to get faster data access, while lowering the energy cost of that access.
The greatest speed and energy efficiency can be gained by placing training data as close as possible to the AI processor core. But the storage architecture employed by today’s designs, created several years ago when there were no other practical solutions, is still the traditional combination of fast but small embedded SRAM with slower but large external DRAM. When trained models are stored this way, the frequent and massive movements of data between embedded SRAM, external DRAM, and the neural network increase energy consumption and add latency. Further, SRAM and DRAM are volatile memories, limiting the ability to achieve power savings during sleep periods.
Much greater energy efficiencies and speeds can be achieved by storing the entire trained model directly on the AI processor die with low-power, non-volatile memory that is dense and fast. By enabling a new memory-centric architecture, the entire trained model or knowledge base could then be on-chip, connected directly to the neural network, with the potential for massive energy savings and performance improvements, resulting in greatly improved battery life and a better user experience. Today, several next-generation memory technologies are competing to accomplish this.
The ideal non-volatile embedded memory for AI applications would be very simple to manufacture, easy to integrate in the back-end-of-line of well-understood CMOS processes, easily scaled to advanced nodes, available in high volume, and able to deliver on the energy and speed requirements for these applications.
Resistive RAM (ReRAM) has a much greater ability to scale than magnetic RAM (MRAM) or phase-change memory (PCM) alternatives, an important consideration when looking at 14, 12, and even 7 nm process nodes. These other technologies require more complex and expensive manufacturing processes and more power to operate than ReRAM.
The nanofilament technology of Crossbar’s ReRAM for instance enables scaling below 10 nm without impacting performance. ReRAM is based on a simple device structure using CMOS-friendly materials and a standard manufacturing process that can be easily integrated with and manufactured on existing CMOS fabs. As it is a low-temperature, back-end-of-line process integration, multiple layers of ReRAM arrays can be integrated on top of CMOS logic wafers to build 3D ReRAM storage.
AI needs the best performance per watt, and this is especially true when applied to power-limited edge devices. ReRAM has demonstrated energy efficiency five times greater than that of DRAM – as many as 1,000 bit reads per nanojoule – while exhibiting better overall read performance than DRAM – up to 12.8 GB/s with less than 20 ns of random latency.
Scientists are already exploring a variety of novel brain-inspired paradigms to achieve much greater energy efficiencies by imitating the way neurons and synapses of the central nervous system interact. Artificial synapses based on ReRAM technology are a very promising method for enabling these high-density and ultimately scaled synaptic arrays in neuromorphic architectures. ReRAM has the potential to play a significant role in both current and radically new approaches to AI by enabling AI at the edge.
—Sylvain Dubois is Vice President of Strategic Marketing & Business Development at Crossbar. He was System-on-Chip architect of OMAP application processors at Texas Instruments, and holds a Master of Science in Microelectronics.