With ICs and boards having many circuits, timing is critical. Multiple clock domains solve some timing problems, but you must consider the design tradeoffs.
With increased clock domains in modern ASICs, clock-domain crossing (CDC) has become ubiquitous, indispensable, and essential. Of course, timing is always an issue. High clock speeds and delays in signal paths can lead to signals arriving at unwanted moments, resulting in metastability.
The first section of this article focuses on the necessity of asynchronous domains in design, and the challenges that come with it. Because engineers can’t avoid asynchronous signals, the article explains how we can mitigate issues arising due to CDC. It also covers the MTBF usage for synchronizer selection in design.
The second part covers the timing aspects of synchronizers and how this functionality might be impacted because of improperly constrained paths between a synchronizer’s different elements. Today’s SoCs use various synchronization techniques, but you must understand its usage in appropriate scenarios. We’ll discuss those techniques and explain the repercussions of not using apt synchronizers in a given scenario.
Thirdly, we’ll cover verification aspects of CDC synchronizers in design and methodology for all aspects of synchronizers in design.
Why asynchronous paths?
Increasing logic in modern chips makes asynchronous paths indispensable. Maintaining the same clock skews at all flip-flops in synchronous designs makes timing difficult. Using a global synchronous clock introduces timing overhead on short paths that need not occur with a faster clock. The result: reduced overall performance. Furthermore, process-voltage-temperature (PVT) variations resulting from delays in clock paths can ultimately lead to non-uniform skew, making the path effectively asynchronous and causing metastability issues. Although engineers use clock-gating cells in synchronous designs to reduce power, asynchronous domains consume even less power.
Asynchronous paths serve a useful purpose, but they come with risks. Propagation time between two flops operating at different frequencies can be timed, but the asynchronous nature of clocks won’t provide a definitive relationship between data toggling and clock edges. For proper flop operation, the data at source (D of a flop) should toggle well outside its setup and hold window. Here’s why.
Consider the diagram in Figure 1a, an internal picture of a positive-edge-triggered flop. To understand metastability, assume that the flops are in a reset state.
Setup time: When value at D changes from 0 to 1, a finite time is required for the signal to traverse from D to node C. Meanwhile, if CLK switches, it will latch an intermediate value node C. Thus, C is stuck, and that metastable value appears at Q. The time required to propagate through this path is setup time. To avoid metastability, data at D must change well outside the setup window (Ts).
Hold time: Assume that D toggles during the time that CLK switches from 0 to 1. Because node A is cut off from node D after CLK switches, A could capture an intermediate value that’s visible at Q. Thus, switching time of CLK from 0 to 1 is the hold time.
In both cases, when CLK switches from 0 to 1, a metastable value is either stuck at node A (setup) or node C (hold). That causes a metastable value to loop around the first latch (ABC), which then tries to resolve itself to stable value.
In cases of Fig. 1b and Fig. 1c, the stuck value traverses in a closed loop (bistable element) with two cross-coupled inverters. By small-signal model analysis, the voltage difference between the two nodes shows an exponential behavior. The value it settles to depends on the intermediate value at nodes A or C, but the voltage difference approaches the transistor threshold voltage (VTH) after a definite time called metastability resolution time determined by the synchronization time constant (τ). If metastability doesn’t resolve in half cycle, then the metastable value may even loop around in the second latch (PQR) when CLK switches from 1 to 0.
This metastable value must not propagate further. Fortunately, two back-to-back flops (2 Dff) can prevent the problem. If settled to a stable value within the clock period at the first flop i.e. the meta flop, second flop won’t capture the unstable value (sync flop). But, how do we ensure that metastable values will settle within one clock cycle? Consider the factors that determine the metastability resolution time.
where VTH is threshold voltage and δ Vi is difference in voltage between stuck value (VS) to metastable voltage (VM), as shown in Figure 2, which shows transfer characteristics of a bistable inverter loop.
The synchronization time constant (τ) decides metastability resolution. It’s dependent on the inverters’ RC characteristics gain (A). The smaller the τ value, the greater the slope of transition from metastable value to threshold voltage. That results in smaller resolution time. Thus, Low Threshold Voltage cells (LVTs) usually replace the meta flop because the LVTs’ τ values are much less than those of high-threshold voltage cells (HVTs) and Standard Threshold Voltage cells (SVTs), thus reducing resolution time.
Metastable voltage also factors into the problem. According to the formula, if VM is 0, then the stuck voltage (VMS) is equal to metastable voltage (VM). When that occurs, it would theoretically take infinite time to resolve. This essentially never occurs? Even if the stuck voltage is exactly at the halfway point between supply voltage and ground (VM), it would eventually change because of random processes such as thermal noise. Because a mathematical formula can’t provide resolution time, you need a probabilistic approach.
Probability that metastability occurs is (TW/T) × FD, where TW is the setup hold window and T is the time period and FD is the frequency at which data pin toggles. Thus probability(enter MS) = TW×FC×FD
Since resolution time follows an exponential behavior, the probability that metastability resolves at a given time t is given by is given by e–t/τ.
Thus, probability of failure or rate at which failure occurs is
Thus, mean time between failures (MBTF) = 1/Rate(failure)
To increase reliability of synchronizers, we maximize MTBF. We select an optimum value of MTBF depending on the expected product longevity. Because we want metastability to be resolved within one cycle duration, t in et/τ is smaller than one clock period or synchronization period (TRES) and τ is technology dependent. Thus, setting MTBF in orders of 103 years, we can calculate the frequency at which our synchronizer works without failure for time period of MTBF.
Note: MTBF is usually set a few orders higher than what is expected from single synchronizer. Considering a, SOC may contain many synchronizers up to orders of 1000, the overall MTBF is reduced by the same order.
[Continue reading on EDN US: Select synchronizer flip-flops]
Cherry Maskara is an IP Design Engineer at NXP Semiconductors.