Embedding FPGA fabric into SoC: A marriage made in heaven

Article By : Paul Dillien

So what could possibly go wrong? The first obstacle is that the FPGA fabric must be supported on the chosen SoC technology.

With three vendors now chasing the embedded FPGA market, customers will have a wider choice, which has to be a good thing.

I just read that QuickLogic recently announced it is joining Achronix and a start-up called Flex Logix in offering to licence FPGA fabric for embedding into SoCs.

At first glance, combining FPGA fabric with an SoC looks like a marriage made in heaven. The end-product will have the benefit of highly optimised functions built into the SoC portion coupled with the ability to be customised using the FPGA. In addition, the combined solution should have a higher performance at reduced power and lower cost than using two discrete devices. What could possibly go wrong?

Let's back up a little and look at Achronix and QuickLogic. Both offer stand-alone devices in the merchant market, where they're up against the duopoly of Xilinx and Intel. These two giants account for close on 90% of the $4 billion annual FPGA market, leaving less than $500 million for the other players. But the third largest FPGA vendor, Microsemi, takes around $300 million, and Lattice fills the fourth spot with FPGA revenue of around $130 million. These numbers might make uncomfortable reading for QuickLogic and Achronix, but they put into context the mountain they must climb. So, both companies have chosen to try to build a revenue stream using embedded fabric. This is the area where Flex Logix is operating, except it does not offer discrete devices and its business model is totally reliant on revenue from licenses followed by unit royalties once its customers start shipping product.

What are the barriers to embedding FPGA fabric in an SoC? The first obstacle is that the fabric needs to be supported on the chosen SoC technology. For example, TSMC commands around 60% of the foundry market, but uses multiple process geometries and variants to service such a large customer base. Flex Logic supports some of the popular 28nm, 40nm and 16FF TSMC processes. Achronix currently use a 22nm FinFET process from Intel, but offers a service to port to your chosen technology.

Assuming you have a match on process technology, the next question is how much FPGA fabric do you think you need? This depends on why you are planning to use the embedded fabric.

One of the "use cases" that is put forward is to provide an upgrade capability where standards are in flux, or to correct possible design faults in the SoC. So, the issue becomes how good your guess is about likely changes in the standard and where in the SoC data path you expect to need the programmable fabric. Neither question is easy to answer.

If the FPGA core is solely for regionalising the design, then perhaps that answer is simpler. However, initially you would probably try to use a processor to achieve regional variables, which could be a better solution. This leads directly to considering existing products such as the Zynq range from Xilinx, or Stratix, Arria and Cyclone devices with embedded ARM cores from Intel.

Achronix and Flex Logix suggest using their core as an accelerator. The vendors both offer choices of cores with a mix of logic, memory and DSP functions. Unquestionably, an embedded FPGA core can achieve significantly higher performance compared to an external device. The data transfer does not incur the delay and power consumption of moving on and off chip and the latency will be much lower. A reprogrammable FPGA core would also allow the functionality to be changed to match the current requirements. (A major competitive threat in this application comes from Intel, which has a sharp focus on the data centre. My bet is that Intel is already working on leveraging its acquisition of Altera to integrate processors and FPGA fabric to create hardware acceleration for its key customers).

Again, the question arises how much die area should be devoted to the programmable core. This is an important consideration because there's a substantial die area penalty when adding programmability. The logic cells are significantly less efficient than standard cells when measured in gates per square millimetre. Also, the configuration data that is used for programming the logic cells and interconnect is typically stored in dedicated SRAM on the die. This occupies area and therefore pushes up the cost. Less obvious is the fact that most FPGA architectures rely on a rich hierarchy of interconnects for efficient operation. This comprises an arrangement of links to adjacent logic cells, more distant cells, and long lines crossing many cell boundaries. The place and route tools use these metal tracks to realize the required logic function. Complex FPGA architectures use a lot of metal layers to achieve this richness; this could potentially drive up the mask count of the SoC, which would—in turn—impact negatively on both the NRE and device costs. Further die area is consumed by the dedicated configuration block which is used to load the configuration. Finally, circuitry will be required to ensure that the production devices can be adequately tested. Testing of FPGAs is a complex and time-consuming task because every individual element of the logic and interconnect needs to be exercised. This factor must be accounted for in the estimated device testing costs.

The tool chain is a major consideration, because it is the key to a fast and efficient integration. SoC design already requires a bewildering array of tools to achieve a successful design. The tool set for embedding an FPGA core inside a chip needs to blend as seamlessly as possible into the established flow. Issues such as simulating the core alongside the rest of the SoC must be supported. This includes modelling the operation of the SoC during the configuration, as booting the fabric takes a lot of data and imposes a "down time" where the FPGA is non-functional until it's complete. Access to the configuration store also needs to be considered. This could be internal non-volatile Flash, or it might use external memory. This could entail additional dedicated SoC pins if existing memory ports cannot be shared. The design tools also need to cater for debugging the FPGA core, including its interface to the surrounding logic. Hopefully, more esoteric concerns such as timing issues when crossing clock boundaries and safe operation during any power down of SoC blocks can be covered by the existing tools. The tool chain challenge extends to the on-going design of reconfigurable functions to be used once the SoC is built. Again, the standard FPGA tool set is likely to be adequate.

I started by saying that my attention was taken when QuickLogic announced that it has linked up with GlobalFoundries to provide its embedded cores. My curiosity was raised firstly because I was only aware that QuickLogic used amorphous silicon antifuse to program their devices. This requires special processing by the foundry, which I couldn't find listed on the GlobalFoundries web site. Antifuse provides a permanent one-time programming, which precludes applications needing reconfiguration. I can only speculate that either the antifuse process variant is available from GLOBALFOUNDRIES, or perhaps QuickLogic has implemented an SRAM option.

The second reason for my surprise is that QuickLogic has not supplied its design tools in the open market for many years. As I understand it, they take a customer's design and implement it in-house. Thus, to incorporate a mature user-friendly tool chain into an SoC flow must have been a challenging step.

The application base that QuickLogic is addressing will be distinctly different to hardware acceleration in data centres, and almost certainly will be in smartphones or IoT systems. This would be a natural extension of QuickLogic's current customer base. (As a point of reference, this time last year, Samsung accounted for 75% of QuickLogic's total revenue, but this has now shrunk below 40%). Interestingly, the cores will also be ported to GlobalFoundries 22FDX. This is a Fully-Depleted Silicon-On-Insulator (FD-SOI) process, and would represent a first use of this low power technology for FPGAs.

Undoubtedly there are applications where embedded FPGA fabric will add significant advantages to a chip. Now that three vendors are chasing this market, potential customers will have a wider choice, which has to be a good thing.

Leave a comment