Take a look back at how hardware emulation evolved, from spaghetti cables to an acronym soup, and finish off with a knowledge bowl of transactors.
My [Lauro Rizzatti’s] first exposure to hardware emulation happened circa 1995 upon visiting a major processor firm in Austin, Texas. Its lab was jam-packed from floor to ceiling with monstrous hardware emulators of different generations from Quickturn, the leader at the time.
What shocked me was the sight of a huge and messy bundle of cables that connected the design-under-test (DUT) – the processor design mapped inside the emulator – to a socket on a PC motherboard in place of the yet-to-be-taped-out processor. In the industry, this setup was called in-circuit emulation (ICE). Behind closed doors, the engineers called it spaghetti cables. It had a mean time between failures (MTBF) of a few hours. Signs all over the lab warned personnel about stepping on the cables.
Back then, ICE was the only deployment mode of an emulator and the very reason for its existence. It allowed testing of the DUT with real-world traffic. The alternative was to exercise the DUT with test vectors generated by a software-based testbench executed on a gate-level or register transfer level (RTL) simulator.
Testing the DUT in the context of a real-world testbed had an allure that no testbench could ever match. The advantage forced emulation users to endure pain, frustration, and discouragement, and caused design managers to blow through their tool budgets. Early emulators were non-shareable resources, only used on-site, with price tags that put them in the capital equipment purchase category.
Testbench simulation and ICE were based on two separated test environments that did not share any commonality in those days of ASIC designs. Simulation was used from the early stages of the design cycle all the way to full ASIC-level verification. ICE was supposed to be the icing-on-the-cake for final system-level validation. But often the icing was not ready before serving the cake. Namely, the DUT was ready for emulation after first silicon samples came back from the foundry, defeating the purpose.
In the mid-1990s, the industry began a long journey to bridge the gap between ICE and simulation. Unknown back then, the solution would come in the form of ICE virtualisation: the creation of a testbed functionally equivalent to ICE. It took many years of improvements in simulation, emulation, and testbench technology to reach a point where co-emulation (a.k.a., co-modelling or transaction-based acceleration) became the primary choice over ICE.
Ironically, spaghetti cables have been replaced by a soup of acronyms – a welcome trade-off given the many advantages of virtualisation. A short list includes:
…and many more. But we’re getting ahead of ourselves.
The first integration between an emulator and a simulator was devised in the mid-1990s. At the time, testbenches were written in the Verilog hardware description language (HDL), and the integration was based on the IEEE Verilog PLI standard. The PLI provided a mechanism for Verilog code to call functions written in the C programming language. The Verilog PLI standard suffered from several drawbacks. It was:
All emulation vendors at the time offered a capability promoted as co-simulation or (cycle-based) simulation acceleration. The use of “acceleration” was a misnomer by a mile. Indeed, it was like driving a Ferrari pulling a big trailer filled with 10 tons of gravel. It hobbled the performance of the emulator, dropping it by three or four orders of magnitude. To be specific, in ICE mode, the emulator effectively ran at a few megahertz. In co-simulation, it achieved at most 1kHz.
In a typical verification environment, the DUT I/O interface includes thousands of signals, with many switching states within each clock cycle. In simulation, the testbench and DUT communicate via a cycle-accurate, bit-level or signal-level interface, and each I/O signal transition is transferred between the two as it occurs.
In co-simulation, the testbench, processed by the simulator, and the DUT, now in the emulator, communicate via the same cycle-accurate, signal-level interface, and again, each I/O signal transition is exchanged between testbench and DUT as it occurs. This was certainly an advantage from an implementation point of view, as it required no modelling changes, but was a severe disadvantage for performance. In fact, even though the emulator could run orders of magnitude faster, it had to wait for these transfers to complete. Further, the emulator often was stalled by the testbench since it had to wait for the slow testbench to react to incoming signals and produce the next set of stimuli.
The large communication overhead essentially killed overall performance. The actual verification performance was limited by the performance of the host PC, the size and complexity of the testbench, and/or the signal-level interface between the testbench and the DUT (figure 1).
*Figure 1: Simulation profiling shows that the DUT consumes 80% of CPU time, and the remaining 20% is used by the Verilog testbench and PLI, yielding about 5× theoretical maximum speed-up in HDL co-simulation.*
Today’s testbenches consume more than 50% of the CPU time in simulation – sometimes more than 90% – limiting co-simulation acceleration to less than a factor of two. It’s no surprise that co-simulation never took off, leaving the ICE mode in the position of prominence that it enjoyed from the beginning.
The EDA industry never sits idle. Time and again, new ideas and engineering feats enhance the design verification landscape. In fact, new verification languages were devised to create ever more advanced testbenches.
A case in point is the use of C/C++ to implement testbenches. The aim is to elevate the abstraction level of the testbench and reduce the impact of the Verilog simulator – the slow link in the chain – from the testbench setup.
The replacement of Verilog with C/C++ testbenches dispensed with the PLI-based communication between testbench and DUT. Vendors resorted to using custom-made APIs based on macros to implement the pin-level or signal-level communication interface.
This testbench approach is advantageous to the deployment of an emulator in charge of the DUT. Now, acceleration factors of up to two digits were possible (figure 2).
*Figure 2: The simulation profiling shows that the DUT consumes 99% of CPU time, the remaining 1% used by the C/C++ testbench and API, thus yielding about 100× theoretical maximum speed-up factor in HDL co-simulation.*