Cadence Tensilica DSP supports floating point for optimum PPA

Article By : Maurizio Di Paolo Emilio

A look at the first DSP family from Cadence Tensilica designed specifically to support floating point arithmetic.

Digital signal processors (DSP) play an important role in several real-world applications, including audio and video processing, radar, telecommunication, drive motor control, virtual reality (VR), augmented reality (AR) and, lately, also artificial intelligence (AI) algorithms. Their main purpose is to take the digitized forms of physical analog signals and manipulating them mathematically according to specific algorithms.

Since the beginning, a large volume of DSPs has been designed to support only fixed-point mathematical representation, a reasonable choice since it enables delivery of the accuracy required by most scientific applications. However, floating-point offers a more relevant and accurate way of representing real world data, which are mainly analog signals. Besides, floating point arithmetic representation according to standard IEEE754 is widely used by compilers, development and modeling tools, thus facilitating the integration and porting with DSP code.

“Tensilica has been in the DSP area for a very long time. We have delivered proven products for audio and voice, radar, LiDAR, and computer vision (including AI). This is a completely new family within the Tensilica portfolio, and our first DSPs designed specifically to support floating point arithmetic”, said Ted Chua, director of product management and marketing, Tensilica DSPs at Cadence.

The new Cadence Tensilica FloatingPoint DSP family delivers scalable performance for a broad range of compute-intensive applications featuring an extremely low power consumption. The low-energy DSP IP optimizes power, performance and area (PPA), allowing for an up to 40% area savings for mobile, automotive, consumer and hyperscale computing applications, and provides an easy programming environment for seamless software migration.

“There are applications, such as motor control, where floating point can do a much better job than a fixed-point system, because of smaller code size, or because it runs faster or controls the speed and torque more accurately and more efficiently”, said Chua.

Tensilica FloatingPoint DSP family

The new floating point DSP IP core family, optimized for PPA, extends from small and ultra-low power to very high-performance devices, offering energy-efficient solutions for the most challenging applications, including battery-operated devices, artificial intelligence (AI) and machine learning (ML), motor drive control, sensor fusion, augmented reality (AR) and virtual reality (VR).

Based on the Tensilica Xtensa 32-bit RISC micro-architecture, the new family (figure 1) includes four cores: the Tensilica FloatingPoint KP1 DSP, the Tensilica FloatingPoint KP6 DSP, the Tensilica FloatingPoint KQ7 DSP, and the Tensilica FloatingPoint KQ8 DSP. The new DSPs not only offer a high scalability from 128-bit vector width to 1024-bit vector width but can also be configured to enable only the capabilities required by the specific applications, ranging from energy-efficient solutions for battery-operated devices to high-performance computing (HPC).

Tensilica FloatingPoint DSP family figure 1
Figure 1. The Tensilica FloatingPoint DSP family

The new family DSP cores share a common instruction set architecture (ISA) with existing Tensilica DSPs’ optional vector floating-point unit (VFPU) and feature a scalable vector width from 128-bit SIMD to 1024-bit SIMD on both the Tensilica Xtensa LX and NX platforms. Performance is improved with respect to Tensilica fixed-point DSPs with the VFPU add-on, with a 25% operational throughput increase in fused multiply-add (FMA) operations. Performance can be further enhanced and differentiated using the Tensilica instruction extension (TIE) language, a Cadence proprietary Verilog-like language allows to define custom operations which are automatically integrated and recognized by the Xtensa toolchain. In addition, the FloatingPoint DSPs offer up to 40% area savings compared to the similar class of fixed-point DSPs with VFPUs.

Tensilica FloatingPoint KQ7 and KQ8 block diagram figure 2
Figure 2. Tensilica FloatingPoint KQ7 and KQ8 block diagram.

As shown in Figure 2, the scalable Tensilica FloatingPoint DSP family offers SoC designers design flexibility able to meet their PPA budget envelope. For energy-sensitive applications, the FloatingPoint KP1 DSP offers an ultra-low energy consumption solution, suitable for battery-powered applications. The FloatingPoint KP6 DSP provides an appropriate compromise between high performance and reduced footprint, delivering excellent performance-per-unit area design. For high performance applications, the FloatingPoint KQ7 and KQ8 DSPs offer the maximum family’s vector floating-point operational throughput.

In addition, the common ISA architecture simplifies software portability and migration. The FloatingPoint DSPs provide also support for custom interfaces, such as queues and ports, simplifying the connection and the integration with external hardware blocks or to match the interfaces provided by existing third-party IPs.

Most challenging applications are fast evolving and moving from the cloud to the edge. Computer vision, IoT sensors, self-driving cars, and smart devices are just few examples where artificial intelligence (AI) algorithms are moved at the edge, providing embedded systems with enhanced and autonomous decision-making skills. All these applications need a family of floating-point DSP cores which can address different market needs, reduce time to market, and be optimized for power, performance and silicon area to keep product costs competitive.

“Today, AI inference at the edge is mainly done with fixed point accelerators. Floating point DSP provides an option to execute AI inference or training in the floating-point format, and we all know neural network training is done using floating point representation”, said Chua.

As mentioned previously, configurability is another relevant key factor of the Tensilica FloatingPoint DSP family. Chua commented, “Our Tensilica DSPs are configurable, meaning designers can select only the hardware features they need, without draining unnecessary power.”

Among the most useful options is the scatter-gather capability, which allows the designer to load data from specific memory location and put it in a vector format.

“The floating-point unit inside DSP is a vector machine. For the data that is not stored in consecutive memory locations, the scatter-gather feature allows you load the dispersed data into one vector format, improving the overall performance”, Chua added.

As regards the aspects related to software development, Tensilica FloatingPoint DSPs come with a complete suite of software tools, including a high-perfor­mance C/C++ compiler with automatic vectorization and instruction bundling to support the VLIW pipeline in the DSP, linker, assembler, debugger, profiler, and graphical visualization. A useful tool is the instruction set simulator (ISS), which allows designers to quickly simulate and evaluate performance. When working with large systems or lengthy test vectors, the Tensilica TurboXim simulator option claims to achieve speeds that are 40X to 80X faster than the ISS for efficient software development and functional verification.

Tensilica Xtensa SystemC (XTSC) and C-based Xtensa Modeling Protocol (XTMP) system modeling are available for full-chip simulations. Pin-level XTSC offers co-simulation or SystemC and RTL-level offload accelerator blocks for fast, cycle-accurate simulations. The Tensilica FloatingPoint DSPs support all major back-end EDA flows, including the optimized Eigen library, NatureDSP library, SLAM (simultaneous localization and mapping) library, and math software libraries, making porting and migrating floating-point software much easier.

“With our family of floating-point DSPs, we deliver a set of software tools which is common with all the other software tools for Tensilica DSPs. For any developer who is already familiar with Tensilica software tools, there is really no learning curve, since it is the exact same tool,” said Chua.

This article was originally published on Embedded.

Maurizio Di Paolo Emilio holds a Ph.D. in Physics and is a telecommunication engineer and journalist. He has worked on various international projects in the field of gravitational wave research. He collaborates with research institutions to design data acquisition and control systems for space applications. He is the author of several books published by Springer, as well as numerous scientific and technical publications on electronics design.

Leave a comment