Here are six tips and tricks that can be used to improve the speed of a chip design and reduce delay within the design at higher frequencies.
When designing a chip, a designer needs to consider many tradeoffs before developing the logic. For example, if a chip is being developed for mobile applications, power becomes a very important factor. Under such circumstances, low-power logic is desired. Lower power logic, as the name suggests, helps reduce power but impacts the performance aspect of the chip. Similarly, if a chip is being developed for data center application, high performance is desired, and power consumption carries relatively less importance.
In summary, the power and performance of a chip are inversely proportional, and a designer needs to find the right balance of power and performance when writing logic for a chip. As previously mentioned, the speed aspect of chip is imperative for certain applications. So, when developing logic for such applications, certain methods improve the speed of the design with small logic optimizations. In this article, we will go through the tips and tricks that can improve the speed of a chip design.
Sometimes, we can boost in the performance of circuit by changing the way we code the logic. One such example is shown in Figure 1 below. Although both codes execute the same functionality, the synthesizer tool synthesizes both codes in a different way. It will affect the delay of the circuit.
Figure 1 In code 1 (left) and code 2 (right), difference can be seen on line 4.
Xilinx ISE and Spartan6 series FPGA were used to analyze the timing performance.
Figure 2 Schematic for code 1 is shown on top and schematic for code 2 on bottom.
Figure 3 Here is a delay comparison of code 1 (left) and code 2 (right). Speed of circuit 2 is better than speed of circuit 1.
Conclusion: The timing analysis from Figure 3 shows that for the same logic, the total delay that we get by placing parenthesis at the right decreases the circuit delay. It will therefore increase the speed of the circuit.
Another way of increasing the timing performance of the circuit is by using pipeline registers. Long combinational logic is broken down into multiple units by adding registers between them. Because of additional registers in the data path logic, the time to get output after an input is applied increases. But results calculated per clock cycle increases accordingly and that decreases the total delay of the circuit.
Figure 4 Code 3 (left) and code 4 (right) have same functionality, but code 4 has pipeline registers while code 3 has none.
Figure 5 In a delay comparison of code 3 (left) and code 4 (right), speed of circuit 4 is better than speed of circuit 3.
Figure 6 Schematic for code 3 is shown on top and schematic for code 4 on bottom, which has pipeline registers.
Conclusion: It’s evident in Figure 5 that the speed of the circuit with pipeline register is much greater than the speed of circuit without pipeline register. Pipelining increases the latency and area, but also increases the operating frequency. Therefore, it should be the go-to option when performance is the criteria.
Another way of tackling the long path problem is by taking the slowest signal to all the way back of the logic. For example, Z = A & B & C & D can be rearranged to Z = ( A & C & D ) & B. This ensures that the signal B goes all the way back of logic when synthesis is done and helps in reducing the overall delay of the logic.
It’s often a good idea to leave arithmetic operations such as adders and multipliers to synthesis tool to implement in design. Making design at lower level could lead to timing problems and hence should be left to synthesizer whenever possible.
When RTL code is written at high level, common logic blocks such as adders, clock domain crossing cells, multipliers, and clock gating cells are implemented with the help of DesignWare library using Synopsys DC. And care should be taken such that synthesizer easily understands the written RTL code. It will result in netlist that is of high quality. Using state machines and implementing a good hierarchy in the design will make the synthesis result better.
Reducing the area constraints will help the synthesizer tool to look for different cells in the provided library information apart from standard VT (SVT) cells. Cells that are faster (LVT) help reduce logic delay but come with increased area and power. Increasing the compiler effort during synthesis will allow the synthesis tool to try out different combinations and substitution before generating netlist. The only downside is that it increases compilation time.
Synthesis is done at lower cycle time than target cycle time to ensure that the synthesis tool tries harder to optimize long data paths. Since metal capacitance is not obtained until place and route, timing analysis obtained after synthesis is merely an estimate. Running the synthesis at lower clock cycle time offsets the metal capacitance and other parasitics obtained later in backend process.
Following these methods will help to reduce delay within the design and make the design run at higher frequencies.
This article was originally published on EDN.
Deekshith Krishnegowda is an IC design engineer at Marvell Technology’s Santa Clara office. He holds MS degree in electrical and electronics engineering from San Jose State University.