It’s risky to use software to try to fix large-scale hardware-design errors, as the catastrophe of the Boeing 737 MAX demonstrates.
Even a precision, high-performance analog system and circuit has imperfections, of course, and the issue for the design team is how to deal with them. There’s no single “one size fits all” answer, but there are some guidelines that are worth considering.
The late, legendary analog designer Jim Williams (author of so many outstanding EDN articles) often wrote and spoke about his approach to achieving solid precision analog designs. First, use the best available with respect to initial tolerances and drift specifications; in some extreme cases, it may be a good idea to “age” a core component such as voltage reference, to minimize long-term drift. Second, whenever possible, use circuit configurations such as bridges whenever possible so component tracking can cause many errors to cancel out. Third, trim out any imperfections, where possible, using manual or electronic trimming potentiometers. Of course, use best-practices for layout, grounding, and bypassing (admittedly, what’s “best” is often a function of the specific topology and application). Finally, but only as a last resort, use software to tweak performance and even implement a micro-calibration or linearization.
That’s good engineering advice. Unfortunately, the tendency now is to assume software can fix most, if not all, system-level problems; maybe add a little dose of AI (artificial intelligence)—whatever that really means—and you’re all set. The disaster of the Boeing 737 Max (two crashes, loss of hundreds of lives) is a very good example of the misuse of software to fix an analog-type system-design problem.
How so? In short, as the Airbus A320neo began making inroads into the Boeing 737 market, Boeing needed a fuel-efficient single-aisle airliner (the basic 737 was the most widely-used aircraft at the time) and needed it quickly. To use less fuel, the new MAX design used larger engines that would have to be moved to be placed forward and higher than in the previous model, to clear the wing structure.
The new placement, however, affected the physical relationship between the engine thrust line and the aircraft’s center of gravity, and this changed how the plane handled; under some conditions of thrust, the nose could pitch up excessively and possibly put the plane into a stall condition. This is a case of an analog design shortcoming, where the design configuration is at odds with the laws of physics.
To counteract this hardware-design shortcoming, Boeing engineers relied on software and added a package called maneuvering characteristics augmentation system (MCAS) driven by a sensor which measured the plane’s angle of attack (pitch angle of the plane’s nose). The system, which operated largely unknown to the pilot, was designed to push the plane’s nose down by moving the horizontal stabilizer on the tail by small increments of 0.6º.
Sounds simple enough, right? Except it’s not: As with every real-world system there are subtleties, exceptions, complex interactions, and more to accommodate. Due to a combination of circumstances, including pilot unfamiliarity with MCAS (and how to disable it), two aircraft apparently have crashed and all Boeing 737 MAX aircraft have been grounded since March 2019. This grounding has been very disruptive to airline capacity and scheduling, to passengers, and to Boeing production and engineering projects.
It would be easy to say that this was solely a management fault, for directing the engineers to fix the initial problem via software and minimize its appearance, as well as any impact on pilots and their training. But engineers can’t say they were just following orders, as they understood it was a tricky thing to add, and would bring all sorts of secondary problems. Yes, it’s possible to believe that smart software can anticipate and overcome any issues, but we know that’s just a rationalization and not the case at all.
There’s been lots written about the Boeing 737 MAX mess and what is being done to fix it, again relying mostly on software, but with a little more hardware as well. Two clear and well-reported articles on what happened in the design phase and what is now being done are from The Wall Street Journal (sorry if they are behind a paywall), “The Four-Second Catastrophe: How Boeing Doomed the 737 MAX” and “The Multiple Problems, and Potential Fixes, With the Boeing 737 MAX”; Wikipedia also has some items worth reading. They are a sobering reminder that it is very risky to count on software to fix analog-design weaknesses. It should also be a warning to the many teams working on autonomous vehicles as they try to anticipate every possible driving scenario and implement algorithms to deal with them.
Have you ever used software to correct, fix, or compensate for larger-scale analog-circuit shortcomings? What was the risk? Did you ever have concerns or misgivings?