CPU extensibility using hardware modules like eFPGA core allows designers to quickly respond to changes in the field.
Processor extensibility with an external hardware module like FPGA or DSP core isn’t a new concept. However, there are no existing hardware solutions that allow the addition of extensions when a product is in the field. Implementations are fixed after tapeout, and there is no flexibility to enable new or custom instructions in a hardened design. That’s where adaptive embedded FPGA (eFPGA) IPs play a key role, unlocking such limitations.
An eFPGA allows hardware reconfigurability in the field by integrating IP core inside the CPU microarchitecture. Integrating secure hardware reconfigurable solutions to perform instruction set architecture (ISA) extension is the key differentiator of future semiconductor products.
For a short primer on instruction set architecture, read this brief tutorial.
This article provides a detailed treatment of the CPU ISA co-extended eFPGA core, which Menta is prototyping with its CPU RISC-V partners: Andes and Codasip. The idea is to provide CPU systems with the capability to integrate eFPGA IP for performing ISA extension in the field.
eFPGA co-extended core for ISA
The eFPGA IP can be tightly coupled with a CPU subsystem. There are several ways to combine such elements. The most straightforward and widely available architectural solution today is the eFPGA co-extended hardware accelerator core, where the eFPGA matrix, a simple accelerator, is used to offload it, performing specific instructions in parallel. Here, the user can also run new instructions on the eFPGA hardware accelerator.
Figure 1 eFPGA is sitting next to the CPU subsystem and is connected via dedicated co-processor interface. Source: Menta
This solution is being prototyped with Andes RISC-V N25F processor and it’s also applicable in the case of Codasip processors.
Andes is providing a complete product portfolio with RISC-V CPU IP, extension capable RISC-V core IP, and an EDA tool. The Andes custom extension (ACE) allows developers to create custom instructions which will be executed faster than a succession of assembler instructions on top of Andes CPU IP with extended toolchain and simulator by using Andes custom-optimized instruction development tools (COPILOT).
As a result, developers can generate all required components automatically and extend the existing Andes processor package—including the processor RTL, compilation tools, debugger, and cycle-accurate simulator—and create or reconfigure existing custom interfaces to communicate external components directly before CPU tapeout.
Codasip is providing a wide range of RISC-V CPU IPs, starting from small, highly-efficient implementations up to Linux-capable RISC-V processors. In addition, Codasip is providing a complete EDA flow for processor design and customization that allows the developers to not only create a custom instruction, but also develop or change microarchitecture features that may add a key differentiation feature. The EDA flow also generates the whole software development kit (SDK), including C compiler that is able to use instructions automatically or via intrinsic/inline assembly, and the hardware development kit (HDK) that includes RTL, verification environment, and more.
In the field, the CPU microarchitecture is hardened and there is no way to add or customize instructions. On the other hand, eFPGA, directly connected to the ACE-auto-generated interface of the N25F processor, can be used for implementing new sets of instructions in the field.
Figure 2 An eFPGA is directly connected to the ACE-auto-generated interface of the N25F processor. Source: Menta
Similarly, Codasip’s H50F IP can be used as a RISC-V processor.
Figure 3 An eFPGA can be directly connected to the H50F processor. Source: Menta
The designer will be able to define its own “CPU – eFPGA” system with the option to configure independently the processors and the eFPGA matrix they want to use.
Figure 4 This eFPGA sits inside the CPU subsystem. Source: Menta
When designers can program the eFPGA IP in the field, we have the possibility to integrate the eFPGA core inside the CPU subsystem and this solution is called eFPGA co-extended CPU subsystem hardware extension. The CPU subsystem architecture must be addressed accordingly to allow the smooth integration of the eFPGA matrix as part of it.
In fact, it’s not merely a software extension, but also a hardware architectural extension of the CPU subsystem. The following example is another prototype that is being tested with the Andes N45 and the Codasip H50F RISC-V processors.
Figure 5 The prototype has been tested with the Andes N45 RISC-V processor before and after the tapeout. Source: Menta
The eFPGA is implementing all of the ACE features. Such subsystem architecture enables ISA extension and reconfigurability in the field, providing the end user with more freedom to innovate.
eFPGA inside data path
By using Andes COPILOT, which is highly integrated with AndeSight IDE, to automatically extend the existing Andes CPU with compilation tools, including a debugger and simulator, designers are able to reconfigure the existing custom instructions only as post-silicon updates without affecting the physical implementation of the main pipeline to quickly reduce the overall development cycle.
Another solution that we plan to name “eFPGA co-extended hardware core CPU extension” offers the possibility to integrate an eFPGA inside the core microarchitecture, and more precisely, inside its data path. Menta is closely working with Andes and Codasip to create such a solution.
Figure 6 The eFPGA can be integrated inside the core microarchitecture. Source: Menta
With COPILOT or Codasip’s EDA tool Studio, any designer can take the ISA or microarchitecture of the processor and create or modify from an ideal combination of required features, and targeted performance, power, and area. The designers can start with any of the RISC-V processor IPs that Andes or Codasip provide. It is also possible to integrate third-party IPs, like an eFPGA core, inside the processor core microarchitecture.
Figure 7 This comparison highlights an eFPGA’s programmability feature in the field. Source: Menta
The block diagram of a generic processor pipeline in Figure 8 shows how to integrate the eFPGA core extended.
The new instructions can be implemented into the flexible eFPGA co-extended core, which is quasi-merged into the main CPU pipeline microarchitecture. Figure 8 is useful when we have instructions that fit to the current opcodes, but new opcodes can also be created. Based on CPU experts’ experience, some extensions also require custom encoding, which means that we have to put more logic into the eFPGA co-extended core.
As eFPGA is design adaptive, logic resource margin can be taken into account when the matrix is being designed. It will allow the possibility of mapping specific encoding logic on lookup tables (LUTs) in the field. In addition, eFPGA is a pure digital core, 100% standard cell-based, and can be hardened in any technology node.
The final solution must be provided with an SDK that contains tools for developing on eFPGA, Andes, and Codasip processors, a validated cross-compiling toolchain, for example. Also required are pre-built libraries, which can be used without having to rebuild, and some documentation to help explain how all of these pieces work together.
Custom instructions are used via intrinsic and, with COPILOT, designers can redefine the custom instructions with flexibility to update the format, even with additional custom operands, registers, or memory access, using eFPGA with more efficient C code using the automatically-updated toolchain. In the case of Codasip, the C/C++ compiler is able to use instructions automatically without changing the original C code. This automation assumes that the instructions have pre-defined meaning such as arithmetic and memory operations.
Can we quantify gains of such ISA extensions?
If CPU extension is done using eFPGA, we are expecting:
On the other hand, as we are integrating additional hardware, we are expecting an increase in silicon area. We may see gains in area and in power as a result of having to implement fewer instructions into the flexible and generic solution. In most embedded systems, the silicon area of the instruction memory is significantly larger than the processor core, so saving instruction memory by improved code density saves both silicon area and power.
Moreover, if we are looking to accelerate a function that is consuming more than 50% of the CPU load, we could considerably reduce the number of clock cycles needed to perform this function by implementing it on eFPGA extended core, which will also reduce the overall energy consumption.
So, the gains in terms of performance, power, and area can be even larger with the use of eFPGA core. The range of possibilities is very high for extreme cases like cryptography algorithms, where integrity data check control instructions are a good example for mapping on eFPGA, and complex instructions targeting image processing, voice processing, and voice recognition applications
The basic instructions in your standard CPU are not always designed with a specific class of applications in mind. There are often some special arithmetic functions that are required, like in artificial intelligence or machine learning functions.
With eFPGA, the basic instructions can be reconfigured to become “more specific” instructions, and the required arithmetic functions can be implemented via embedded custom blocks that are available in Menta’s eFPGA matrix.
This article was originally published on EDN.
Imen Baili is a product application engineer at Menta.