Bookmark and Share Printer-friendly version Email to a Friend

Can't find your digital-system problem?Maybe it caught a different bus

( 01 Nov 2002 )
Jim Fenton, Tektronix Inc

Today’s fast digital systems rely on techniques such as multiple processors, multiple buses, integrated functional subsystems, and serial transmission to support a continuing series of speed and throughput increases. In the “old days,” a single microprocessor took care of computation, memory transactions, I/O, and more. Today’s systems—whether laptops, desktop PCs, or servers—may rely on a small army of secondary processors and ASICs to handle these tasks.

Over the same time span, IC-processor technology has seen clock and data rates multiplied by orders of magnitude since the inception of PCs. The increased speed and architectural changes add up to a complex double-edged challenge for engineers who design digital systems: They must debug faster signals traversing more complex bus architectures. Although speed-related issues receive a lot of attention, the multiplicity of buses also creates a demanding troubleshooting environment.

Even today’s basic single-processor systems have more buses and communication interfaces than in the past. Today’s digital systems simultaneously perform many activities. For example, a dedicated ASIC-based subsystem may act as a bridge between the CPU and the printer interface. It independently coordinates printing activities, once the main CPU instructs it to do so. Meanwhile, the CPU goes on with high-speed number crunching, unhampered by requests from the printer interface.

Multiple processors and subsystems must communicate with one another, creating a need for more buses within digital systems. Some of these buses run serial protocols, some are bidirectional, and all are fast. InfiniBand and Serial ATA exemplify this emerging class of high-speed buses. Not to be overlooked, memory-address and data-bus architectures are seeing similar increases in performance. Rambus memory subsystems, for example, incorporate a proprietary bidirectional bus architecture that delivers bit rates as high as 800 Mbps.


EVERY BUS A POTENTIAL TEST POINT
Figure 1 depicts a typical situation in a modern high-speed digital system. Here, buses that transport high-speed signals in both directions at once join two servers. For the sake of clarity, most of this discussion applies only to Server A.


Figure 1
In this example of server architecture, the circles indicate test points that you can probe directly with a logic analyzer, whereas the triangles indicate points that require an intervening decoder or demultiplexer.


The bidirectional buses and the control elements connected to them provide an efficient path for system expansion (scalability). Although the servers are virtually identical in their configuration, Server B might be used as the gateway to a large memory system, and Server A concentrates on computation tasks.

The many buses in this system exchange data among hardware components—processors, ASICs, and chip sets—that work simultaneously to maximize throughput. The front-side bus, once the only true bus in many digital systems, is just one of many points whose activity you may need to monitor during troubleshooting.

In the figure, the circles and triangles indicate potential test (probing) points used to observe the effects of instructions rippling through Server A. The circles indicate buses that you can probe directly, whereas the triangles appear on buses whose signals must be decoded with special hardware and software adapters that, in turn, may require monitoring.

The solid lines that symbolize the buses in Figure 1 can contain 16 to more than 300 traces or pins that you must probe. Many of the other buses are serial in nature. Although these buses have fewer connections, they carry packets of data that must fan out to individual data streams before they can be interpreted.

An error detected at Server A’s communications port, for example, might have its origins on any bus or device feeding the path between the source and the destination. It is not enough to trace back from the output of the communications port and hope to find the error on the port’s input. Too many variables exist, including:

  • Transmitter or receiver problems—Faulty transmission components anywhere in the path may not source or sink enough current to deliver valid logic levels.


  • Timing problems—Logic errors can result from race conditions, setup-and-hold violations, and timing misalignments.


  • Binary inversions—Miswiring or incorrect firmware code may produce a binary 1 where there should be a 0 and vice versa.


  • Controller code or memory errors—Incorrect instructions in bridge or switching nodes may cause invalid outputs under certain circumstances, as may incorrect data from the memory. This failure may result from a memory-component failure or the dreaded “garbage-in/garbage-out” syndrome.


  • Signal-integrity problems—Compromises in the analog domain commonly result from signal-path faults, such as crosstalk and loading.

The lesson here is that an error observed on a bus may have nothing to do with that bus. A printing problem, for example, may be far removed physically, electrically, and logically from its cause. To put it more succinctly, that problem you can’t seem to find may have caught a different bus!

To solve problems of this sort, you need to use tools that can address the realities of today’s complex digital systems. These realities include gigabit data rates and many hundreds of test points.


TROUBLESHOOTING TOOLS
Troubleshooting these increasingly complex digital systems is the mission of today’s logic analyzers. Though often teamed with other instruments, such as digitizing oscilloscopes, signal sources, and pattern generators, the logic analyzer is at the heart of digital debugging. Much of the effectiveness of the troubleshooting process rests on the capabilities of this one instrument.

The trends in digital systems—the logic analyzer’s target—are driving the trends in logic-analyzer performance. The basic laundry list of logic-analyzer requirements is, of course, consistent with the escalating performance of the target technologies. Four banner specifications are important for an instrument used to debug multiple-bus digital systems: sample rate, channel count, memory depth, and triggering features.

The logic analyzer’s sample rate for asynchronous (timing waveform) acquisition is critical as target-system speeds climb toward the gigahertz range. Today’s logic analyzers can sample at 8 GHz (125-ps timing resolution) on thousands of channels at once. The ability to see all these channels simultaneously is key to solving problems in large digital systems.

The instrument must probe hundreds of test points at once. It must have enough channels to acquire data from processor, memory, communication, and I/O buses simultaneously. The front-side bus alone may consume 300 or more logic-analyzer channels. Because the number and width of these buses are expanding, the logic analyzer’s channel count must be readily expandable. Like their targets, logic analyzers must be scalable.

Even narrow serial buses can eat up many channels. A 128-bit serial bus still requires 128 acquisition channels, even though its content may be delivered on only four or eight lines. Once depacketized, the content expands to its full width again.

The logic analyzer must be able to partition itself to handle multiple discrete buses with diverse timing characteristics. You can expand some logic analyzers to 8000 channels and beyond.

The instrument must have plenty of memory depth if it is to capture complex system transactions in their entirety—a necessity for tracing errors back to their source. Capturing a complete boot sequence on a large system, for example, can require 64M samples or more. A deep memory provides enough capacity for many steps and subroutines that execute.


Figure 2
In the time-correlated logic analyzer, local PLLs retime the signal from a master-system-reference clock.
Triggering features are the key to fast, effective troubleshooting with a logic analyzer. The fact that the analyzer has acquired data proves that a specified condition has occurred. For example, if the trigger is set to detect a setup-and-hold timing violation, storage of the acquired data can begin only when that violation occurs. Thus, the logic analyzer confirms the suspected error and captures the surrounding data.

Logic-analyzer triggering subsystems vary widely in flexibility. The most advanced instruments include a fully programmable state machine dedicated to triggering functions.
Special trigger programming aids and tools simplify the task of setting up complex triggers. Other important logic-analyzer features include bus and processor supports, probing options, and display features (see sidebar “Multiple monitors reveal the big picture”).

A digital system having hundreds of channels of information on many buses may require the services of multiple acquisition modules, sometimes housed in multiple mainframes. If the device under test has 500 test points, and the logic analyzer has 500 channels, is the system complete?

Not necessarily. You must not ignore the importance of time correlation among a logic analyzer’s many channels. To ensure a meaningful acquisition across multiple buses, every channel must use the same timing reference to sample its acquisitions. Equally important, this reference must have sufficient resolution to accurately place events in time relative to a known acquisition event, such as the logic analyzer’s Start command. These rules should apply whether the captured signals are taken from adjacent pins on a single bus or on two widely separated buses.

It is common practice to dedicate entire logic-analyzer modules or even a separate analyzer mainframe to capturing the events on a single bus. Imagine this example: 300 logic-analyzer channels are already attached to master Server A, and it is necessary to monitor a suspected fault condition on Server B’s memory bus. One approach is to devote an unused module in a second mainframe to capturing the memory-bus signals. Now, you can equate an instruction on the master front-side bus to its effect on the memory bus. Although the test points on the master front-side bus are physically distant and several operational cycles removed from the Server B memory bus, precise time correlation makes it possible to track a specific event (such as the instruction and its effects) through the system. Maintaining accurate time correlation, irrespective of the number of channels used, is essential for digital-system debugging work.

The ability to preserve critical time correlation across many channels, modules, and mainframes is at the heart of modern logic-analyzer architecture. Today’s advanced instruments rely on dedicated PLL hardware within each acquisition module in one or more mainframes to locally retime the system-reference-clock information ( Figure 2, see sidebar “When status queries miss the bus home”). This architecture prevents time drift between the oscillators in the respective acquisition modules and ensures ±2-ns timing correlation among any number of expansion mainframes. Without local retiming, oscillators can drift by hundreds of samples relative to each other. In this logic-analyzer architecture, you can view data captured on logic-analyzer channel 600 in its proper timing relationship to the events on channel 60, even if the two signals are on disparate buses within the system under test.

Tight timing correlation between the multiple acquisition modules significantly reduces the channel-to-channel timing skew. In a typical multiple-mainframe system, today’s logic analyzer has a channel-to-channel timing skew of ±250 ps between any two data channels in the system.

Digital-system troubleshooting is confronting designers with new challenges in complexity. Systems have more buses and more subsystems than ever before. The logic analyzer, long the cornerstone of digital-debugging work, is growing with its target devices and applications. Today’s most popular logic analyzers offer high data rates and, equally important, hundreds or even thousands of time-correlated acquisition channels. Increasingly, designers are learning that accurate time correlation helps them to work more efficiently as they trace signals across multiple buses and nodes.


WHEN STATUS QUERIES MISS THE BUS HOME
The easiest way to grasp the value of systemwide, time-correlated probing is to look at an application example adapted from real-world experience. The following discussion is based on the server architecture discussed in the main text and shown in simplified form (with logic analyzer probes added) in Figure A.


Figure A
The logic analyzer simultaneously probes multiple buses and provides time-correlated views of each.
A prominent computer maker uses a fully configured server as a prototyping system in which newly designed elements, firmware, and software are installed for validation and characterization. The system has a number of bus test points more or less permanently connected to a multiple-mainframe logic analyzer, with other test points available but not always connected. Because it employs the PLL architecture described in the main text, this logic analyzer can accommodate additional mainframes without any loss of time-correlation accuracy. In this example, the instrument simultaneously connects to all of the bus test points in Server A-1042 logic-analyzer channels.

A routine status query to the PCI hub
was drawing a blank, causing a time-out in the CPU. Somewhere on its way back to the CPU, the response from the PCI hub was dropped

Probing the test point on the I/O link to the PCI hub revealed that the hub itself was responding correctly. The logic analyzer’s time-correlated display made it easy to track the CPU query as it rippled through the buses and to associate the PCI hub’s response (after a known number of cycles) with the query. The proper code was being returned through the bidirectional link, and was arriving at the I/O-hub input.

The next step was to display the channels connected to the I/O-hub output. Again relying on the logic analyzer’s correlated traces, the engineer knew in exactly which cycle the output data should appear. This insight was important, because (as Figure 2 shows), the I/O hub has several interfaces. It is essential to know which data associates with which interface. In this instance, no valid data was present during the appropriate cycle. By lining up the traces of interest on the logic-analyzer display, it was easy to see the timing relationships among the signals from the two buses.


A LITTLE LOOP WILL DO IT
Having isolated the problem to the I/O hub, the designer wrote a short loop of status queries. Confining the problem to a specific element greatly reduced the search area. The loops revealed that some queries with a certain interrupt priority weren’t reaching the I/O hub.

Next, the engineer wrote a loop to apply commands with several different interrupt-priority levels. Running this loop characterized the parameters of the interrupt problem and revealed that higher priorities had no trouble getting through the I/O hub.

The designer concluded that there was a design flaw in the interrupt-allocation look-up table within the I/O hub. The firmware incorrectly rejected lower level interrupts, rather than giving them a low but still usable priority. The designer corrected the problem with a relatively simple firmware revision. Fortunately, the I/O hub had its own self-contained FPGA, and the designer reprogrammed it with the appropriate interrupt-evaluation data.


MULTIPLE MONITORS REVEAL THE BIG PICTURE
When sorting out the data acquired from a complex digital system, human nature looks for an easy way to divide and conquer the mass of information. Many digital engineers like to organize their logic-analyzer display into discrete buswide views.

Most logic analyzers allow their users to group related signals, such as all the lines of a memory-address bus into individual windows. This organization lets you view two or three windows if the display is large enough. Selecting a signal or time location in one window, usually by placing the cursor on it, marks the same location in all of the windows. If the instrument offers precise time correlation, these synchronized views can reveal, for example, a transient on the memory-data bus and its impact on instructions sent to the processor via the front-side bus.

The amount of display space is limited. Even a large monitor offers about 1600×1200 pixels of display area, less than 250 square in. Realistically, this area is adequate for three or four windows at most, and these windows must be small to avoid overlap. Even so, users must frequently scroll up and down to compare bus traces.

Recent advancements have transformed this scenario. Some logic analyzers support as many as four external monitors, providing a total of 3200×2400 pixels of display space and allowing ample room for clear, legible waveform traces, text, and graticule lines.

You might set up the four monitors to display one monitor per bus, or you might use one monitor pair to display a timing diagram and the other pair to present the equivalent state display. To view the activity on a wide bus all at once, you can even treat four monitors side-by-side as one contiguous window.


Author Information
Jim Fenton is the logic-analyzer hardware-engineering manager at Tektronix Inc (Beaverton, OR), where he has worked for 17 years. He holds a BSEE from the University of Michigan (Ann Arbor, MI) and an MSEE from Oregon State University (Corvallis, OR). He is currently pursuing a master’s in engineering management from Portland State University (Portland, OR).

 
Printer-friendly version Email to a Friend
Article Rating 
Average Rate: No rating yet
 
Poor Quite Good Good Very Good Excellent
 
 
Related Content 
 
 
ADVERTISEMENT
 
 
ON-DEMAND WEBCASTS

 
Highest Rated  
 
 
 
 
ADVERTISEMENT
 
 


TECHNOLOGY NEWS
 
 
 
PRODUCT NEWS
 
FEATURED SPONSORS
 
 
 
DESIGN CENTERS
 
ADVERTISEMENT
 
     
CURRENT ISSUE
 
COVER STORY:

Analog design in the 21st century: challenges, tools, and IC advances

We are now more than a decade into the 21st century, and on an ever-accelerating fast track to technological innovation in electronics. The transistor and progression into the IC, or microchip, lit the fuse leading to the explosion of innovations in electronics that is now taking place. Since the wi ...
HIGHLIGHTS:
SPECIAL REPORT
DESIGN FEATURES
 
PULSE
 
 
 
 


 


RSS
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

POLL
What type of environmental regulation do you think will be most beneficial for the tech industry?
Proper recycling and disposal
Push for power efficiency and energy conservation
Chemical/lead regulation
View results

 
 
 
 
 
 
Power Technology E-newsletter 
Power.org Releases Power Architecture 32-bit Application Binary Interface Supplement
EDNA, May 11
POL Regulators Designed for Energy-efficient Computing
EDNA, March 11
Fairchild Revolutionizes Power Savings
EDNA, January 11
Lattice Transforms Board Power and Digital Management
EDNA, November 10
 
Analog E-newsletter 
12V Dual-channel Synchronous Buck Converter Features Integrated FETs
EDNA, February 10
Power MOSFETs features reduced top-side thermal impedanc
EDNA, January 10
 

 
KNOWLEDGE CENTER
 
Texas Instruments: DaVinci™ Technology
 
Texas Instruments: Safe Bet Series
 
 
INDUSTRY LINKS
 
Photonics Association (Singapore)
Singapore Industrial Automation Association (SIAA)
Taiwan Semiconductor Industry Association (TSIA)
 
 
 
 
OUR SPONSORS
 







Keithley Instruments
With more than 60 years of measurement expertise, Keithley Instruments has become a world leader in advanced electrical test instruments and systems from DC to RF (radio frequency). Our products solve emerging measurement needs in production testing, process monitoring, product development, and research...
 
 
 
     
 

EDN India | EDN Taiwan | EDN Korea | EDN Japan | EDN China | EDN | EDN Europe

 
ABOUT EDN Asia | | CONTACT US
   
© 2012 EDN Asia All rights reserved.