Monte Carlo gone wrong

Article By : Charles Hymowitz

Engineers usually perform and assess Monte Carlo analysis incorrectly, which can result in wrong design decisions.

This will likely come as a surprise, but the way engineers perform Monte Carlo analysis and assess the results, is likely incorrect. Misinterpreting Monte Carlo results can lead to the wrong technical and business decisions. While performing Monte Carlo analysis can help, you may be surprised to learn that there is more than one Monte Carlo analysis method you can use.

A Monte Carlo analysis is a multivariate modeling technique that you can think of as a series of “what if” scenarios. It lets engineers run multiple trials and define the probability distribution, or risk assessment, for a given set of outcomes. In electrical circuit Monte Carlo analysis, the analyst sets the probability of each part characteristic (some parts have many characteristics with tolerances) that affects the outcome and runs multiple simulations of the circuit to find the range of possible outcomes for a given function.

Many circuit functions aren’t monotonic with respect to parameter variations. Monte Carlo analysis is, therefore, essential to worst-case circuit analysis (WCCA). Performing extreme value analysis (EVA) assessments alone is insufficient and results in many analyses that don’t produce the worst possible results. The worst case is often not at the parametric extremes and you will miss findings if you do EVA only.

There are many misconceptions and erroneous WCCA guidelines about the analysis sequence. You should consult the Aerospace terms of reference (TOR) on WCCA, of which I was a contributor [Ref. 1]. The decision process on which analysis or analyses to use is based on several factors, including whether the function being assessed is monotonic with respect to parameter tolerances.

The analysis sequence is not extreme value analysis (EVA),
root-sum squared (RSS), and Monte Carlo, even though that’s what many guidelines state. There is no official “analysis sequence” path. Tolerances and analysis methods are essentially independent of one another and RSS tolerances can be applied to sensitivity, parametric EVA, or Monte Carlo analysis methods. EVA should not necessarily be the first or only method used though it can be a valuable first step, often letting you understand which parameters have the greatest effect on the output function.

Don’t think of doing Monte Carlo analysis as a “fallback” if an analysis fails compliance. Reducing the conservativeness is certainly possible and practical in many cases, but then it’s not a worst-case analysis (see Why worst-case circuit analysis is challenging to perform for a list of reasons why we do a WCCA). For many types of analysis, including stability and bus/step load transients, or where there are a significant number of parameter tolerances, Monte Carlo is the correct first, and maybe only, analysis to perform.

Another misconception is that Monte Carlo analysis requires a prohibitively large number of runs. This is not necessarily true.

Monte Carlo example
Figure 1 shows an example Monte Carlo analysis for the stability of a switching power supply using a state-space-average model. The simulation runs an AC analysis and looks to see if the control loop meets the generally accepted space industry guidelines of 10 dB gain margin (top plot), 30 degrees phase margin (center plot). The distance from the singular unstable point (-1,0) (bottom plot) is also shown. More on stability guidelines and why they are problematic will appear in a future article.

Monto Carlo analysis power supply
Figure 1. A Monte Carlo analysis of this switching power supply shows (top to bottom) gain margin, phase margin, and distance from the singular unstable point.

In this example, we assigned a tolerance range and distribution–usually Gaussian or uniform–to each part that impacts the output function of interest. We then simulated a number (not arbitrary) of cases, where the tolerance characteristics vary according to their distribution and the results of the output functions of interest were recorded.

We recorded the gain, phase, and stability margin derived from the distance to the singular unstable point during the AC stability analysis of a 12 V to 1.8 V regulator in Fig. 1. The statistics (mean and standard deviation) and the worst cases (end-points) were recorded for each output function. In the case of stability analysis, the stability margin (Figure 2) provides the definitive performance.

Figure 2. The stability example results often only involve phase and gain margin. Stability Margin is, however, a better and more accurate assessment of stability. Here, the distance from the singular unstable point is converted to degrees for convenience and comparison with the 30-degree requirement [Ref. 2].

Monte Carlo results must be evaluated in a valid statistical framework. This includes a confidence interval derived from a properly selected population coverage and a confidence level (or certainty) combination tied to a specific number of cases. Without this framework, you can’t correctly assess the results.

A tolerance interval for a function is the interval in which there is some likelihood (or, of which you feel some level of confidence) that a specified fraction of the population’s values lies, based on a sample measured from this population [Ref. 3].

Use of the maximum and minimum results, as the “worst cases,” from an arbitrary number of runs is not the correct way to perform and assess Monte Carlo analysis. That’s because, you don’t know what range of performance has been computed, statistically speaking. For instance, you shouldn’t just run 500 (or 100 or 1000, etc.) cases and use the end-points as the “worst case” result for compliance purposes. Monte Carlo analysis done this way can easily result in incorrect post-analysis design decisions as the results may be either worse or better than intended.

For aerospace and automotive WCCA, you can perform Monte Carlo analysis using one of two methods: Tolerance Intervals for Normal Distributions, where the result is the mean +/- some number of sigma, or Distribution-Free Tolerance Intervals (nonparametric statistics), where the result is the worst results. In both cases, the number of runs is tied to a specific certainty and population coverage.

There are specific rules for using the end-point values. The distribution profile of the results may or may not be important depending on the Monte Carlo methodology you choose. In the case of Fig. 2 (stability margin histogram), result is not a normal distribution. Therefore, you should use the distribution-free methodology [Ref. 4].

For the Tolerance Intervals for Normal Distributions method, the mean and standard deviation of the function (stability, step load, ripple, etc.) under simulation are computed from the Monte Carlo results. The performance is then computed using a range (mean value +/- some number of sigma), where the number of sigma is based on a population (probability) coverage/confidence level (certainty), usually 99.73%/99%, 99.73%/95%, or 99%/90%. Aerospace readers may recognize the 99/90 as the same statistics used for small sample radiation assessments. For the Monte Carlo assessment to be valid, both quantities must be defined because we are dealing with a sampled data system. It is not sufficient to say that analysis must be done to “3 sigma.”

[Continue reading on EDN US: Confidence levels]

Charles Hymowitz is a technologist, marketer, and business executive with over 30 years of experience in the electrical engineering services and EDA software markets.

Leave a comment