In the manufacturing department of a defence company, we build what is commonly referred to as a black box for our military. In the black box are circuit cards of various functions plugged into a motherboard.

One day, a software engineer was tasked to investigate and troubleshoot why six circuit cards of the same type were failing in the box. The cards had been sitting in a bone pile for quite a while because no one could figure out what was the matter with them. Each circuit card was worth about $10,000 and management didn’t want to scrap them if at all possible. Apparently they were passing test on the test fixture but they just would not work correctly in the box.

The software engineer who had been working on them for a couple of days came to me to ask my advice on troubleshooting them, since before I became an engineer, I was the technician who tested these cards. I told him I’d help, but I knew trying to troubleshoot these cards in the box would be very difficult. This card had about 30 discrete chips in the failing circuit and probing it for the bad signal was not going to be easy. However, if we could fix one, then we’d have all six cards fixed, since they were failing the same way. He then told me that when he writes to certain registers on the card, he can get it to fail at will. Excellent, I thought. He’s done quite a bit of troubleshooting already. This might be easier than I thought.

I told him we shouldn’t use the box to troubleshoot it, but since he discovered how to easily get it to fail, then we should be able to look at it on the test fixture, which should make it easier to analyse it. I put the card on the test fixture, set it up, and discovered quite quickly what the problem was. The problem was that somehow the circuit was getting an inadvertent reset command. I got my scope out and started probing around. It didn’t take but a few minutes to discover the cause.

The circuit employed a quad 2 to 1 mux chip 54ALS157. Two inputs to one output switched by an input control logic. I discovered that when the switch input control logic was changed on one of the sections it generated a 10nsec glitch on the output line and that this output line was connected to the reset circuitry.

In other words, there was something wrong with the chip. It should not generate a glitch when the switch was toggled one way or the other. We checked all six cards with the scope and they all exhibited the same glitch. We visually looked at all six circuit cards and found that they all had the same date code on the chip. Apparently, the chips were from a bad lot from the manufacturer. Just one of those things that occasionally happens. Having the chips replaced should fix the cards. After the repair, the six circuit cards finally passed both on the test fixture and in the box. We saved $60,000 and management was happy. The software engineer and I both got a nice monetary reward.