The elements necessary to perform WCCA should be brought together along with the right software, people, test data, and experience.
The term escapes is a euphemism for all the excuses programs, program managers, engineers, and reviewers use to curtail or eliminate worst-case-circuit-analysis (WCCA) associated activities. It is essential that all the elements necessary to perform the WCCA be brought together along with the right software, people, test data, and experience. With each hurdle, the analysis will stall, impacting the level of rigorousness and the veracity of the conclusions. In addition, there are a variety of issues that can plague a successful completion of WCCA.
Figure 1 Poor power integrity can lead to rogue waves. These are transient load conditions where there is an alignment of the stepped current requirements of the load—FPGA, processor, and memory—and resonances in the power rail’s PDN impedance. When this happens the power supply voltage can jump out of specification resulting in a fault condition, which is tremendously difficult to recreate in traditional production testing. But this worst case event happens all the time in real life. Source: AEi Systems
Some common escapes include a lack of budget or foresight to properly scope out and budget the effort; poor, non-existent, or ever-changing design specifications or flowdown requirements from the customer; saying we test and we have redundancy (note: redundancy won’t save a bad design); and time compression or poor scheduling.
WCCA needs time. It is often force fit between the end of the design process and production. Unfortunately, too many programs find themselves still designing right up until final reviews or after WCCA findings are revealed, and there is little or no time to properly perform the WCCA let alone fix the issues found. WCCA needs time to be completed properly and any non-compliances need to be addressed appropriately. A reanalysis pass to define and confirm fixes is always necessary.
WCCA needs test data to support models and assumptions; if the hardware does not meet up with the analysis, problems will occur.
The need for hardware is essential for efficient WCCA. The lack of part data to fill in datasheet holes, model correlation data to define model performance, and circuit correlation data to anchor simulations, assumptions, and conclusions is a critical issue. Without data, you will be making judgements and design decisions without a firm foundational basis.
Designers who think they don’t need to do the analysis is another common escape. The selection of the parameters to be analyzed should not be generated by the circuit designer alone. Mistakes in the design will often be repeated in the analysis. Circuits that the designer believes are too simple, obvious, or repeatable may be ignored. It’s often where problems lie.
Don’t underestimate the tolerance stack-up. Until the tolerance database or parts variability database (PVDB) is compiled and the analysis performed, it is tremendously difficult to know what part tolerances will do to performance. We know very little about the parts we use, and we often don’t know the sensitivity of the circuit performance to various unbounded or undocumented parameters. To dismiss the variances as inconsequential before performing the analysis is one of the biggest escapes. Whether RSS’d or EVA’d, the tolerance stack-up is bigger than you believe it to be.
Company, program, and engineering biases
These entities are often infected by the nominal. They believe they know all they need to know given typical data. Typical datasheet information, curves, and test data are often used to justify conclusions about worst-case behavior, tolerance distributions, and so much more. Often the nature of the data is not even explored. Statistically speaking, the nominal does not tell you about extremes and should not be used to bound WCCA. The difference between a nominal stress analysis, a worst-case steady state stress analysis, and a worst-case transient stress analysis—using EOL part values, loading, and environment extremes—can easily be an order of magnitude.
Companies believe past success is a future predictor, even if the parts, requirements, environment, and the designs change. Likely, it would be difficult to trace the issue back to a particular circuit or functional block if there were a unit failure. They also believe they have done all the homework they need; for instance, what little analysis they do along with test data, which is both deemed accurate and sufficient, clears any functional concerns or risks.
These entities put 100% stock in reference designs and datasheet information without any hint of pessimism. They are unfamiliar with the tolerance stack-up and the ‘Cracker-Jack’ phenomenon—the surprises that are waiting inside most ICs that you don’t know about until you open the box and look (deeply) inside. They do not understand the role limited, priority-based, and targeted WCCA can play in achieving higher reliability and meeting mission performance goals, and they do not care to learn how it can benefit them.
Another escape is saying it costs too much or we don’t have anyone to do it—clearly these escapes bely reality. As for cost, WCCA doesn’t cost money, it saves money. This may seem misguided at first, but once you understand the direct and ancillary benefits, it’s easy to see WCCA’s value. Below are just a few planned and executed ways to manage costs. As for who can do the work, well, consultants exist and the work can be targeted.
The return on investment for WCCA is significant. Table 1 lists some of the many reasons to perform WCCA.
|Reasons to perform WCCA|
|Design verification and reliability||To verify circuit operation and quantify the operating margins over past tolerances and operating conditions: Will the circuit perform its functions and meet specifications?|
|To improve performance: determine the sensitivity of components to certain characteristics or tolerances in order to better optimize/understand a design and what drives performance|
|To verify that a circuit interfaces with another design properly|
|To determine the impact of part failures or out of tolerance modes|
|Test cost reduction||To evaluate performance aspects that are difficult, expensive, or impossible to measure (i.e. determine the impact of input stimulus and output loading so as not to damage hardware)|
|To set ATP limits: without analysis, how will you know what you are supposed to see in test?|
|To verify SATs/SITs and if they are needed/what their limits should be|
|To reduce the amount and scope of testing|
|Parts assessment||To determine if a part is suitable (too cheap, too expensive) or if a new technology can be used|
|To support/set critical parameters and SCD requirements/screening definition|
|To perform single event transient (SET) analyses|
|To support the switching and transient stress and derating analysis|
|Schedule, cost, or contractual risk reduction||To reduce board spins: determine the impact of late stage design or part changes|
|To verify changes to heritage circuits|
|To obtain better insurance rates and reduce contractual liabilities|
|To avoid a catastrophic or costly incident|
|Return on investment||To improve future products|
|To improve the knowledge and capability of your engineering staff|
We test so we don’t need to analyze
This is one of the biggest escapes of all: Can’t electrical testing be used as a less expensive alternative? The answer is generally no.
The beginning-of-life (BOL) vs. end-of-life (EOL) tolerance variances are discussed in the blog “Optimizing Electronics Test/Analysis Ratio.” BOL tolerances dominate. Testing does not usually account for BOL tolerances; initial testing is rarely extensive due to various practical constraints, so test does not retire as much risk/margin as people think relative to the tolerance stack-up.
Usually, testing only determines typical 25°C performance. In many cases, extended testing must be performed with extreme operating conditions such as temperature, voltage, and power to determine aging margins. This can overstress the hardware. Testing is only valid for the measured lot and may vary lot-to-lot and manufacturer-to-manufacturer. It requires the parts to be procured prior to completion of the WCCA, which can be very risky and very costly if sophisticated test equipment is required.
While testing is essential to support the WCCA, testing doesn’t cover EOL analysis and often doesn’t even cover all operating conditions. In addition, testing has the following inherent concerns:
Eliminating bias, ensuring independence
The project engineer is often under great schedule pressure, program budget pressure, and the company’s political pressure. One of the main tenants of the Aerospace TOR guideline on WCCA (TOR-2012(8960)-4_Rev. A) is that WCCA performed in-house is not independent. Monetary, political, and personal feelings all serve to destroy the checks and balances that WCCA is supposed to bring to the design process.
Figure 2 These images show worst case events happening in real life: (L-R) a rocket explosion, a Samsung Galaxy battery fire, and a Tesla on fire. Source: AEi Systems
It’s not to say that designers should not be involved. Certainly, the designer should develop the nominal models and be involved in the WCCA review. But independence is key to avoiding escapes. While some of these biases can influence even the most independent of analysts, this is clearly why companies and design engineers should not do their own worst-case analysis and why it is imperative to use an independent assessment team.
This article was originally published on EDN.
Charles Hymowitz is a technologist, marketer, and business executive with over 30 years of experience in the electrical engineering services and EDA software markets.