Free Print Subscription Printer-friendly version Email to a Friend

Platform FPGAs enable network processing platforms

( 01 Nov 2004 )
by Amit Dhir, Strategic Solutions Xilinx, Inc.

Revenue-generating routers require a technology platform that provides significant packet processing power (even under worst-case traffic conditions) while offering flexibility at lower cost. The rapid advancement of FPGA technology has made it possible to design entire routers and switchblades based on FPGAs. Today's Platform FPGAs provide a complete platform for packet processing, classification and policing, traffic management, backplane communication, etc.

Network processing
Network processors are highly optimized off-the-shelf devices used for processing network data traffic, offering time-to-market and flexibility over the incumbent ASICs. They extract, classify and filter incoming bit-streams, determine destination ports, and forward data packets to switch matrix with optional traffic management functions.



Figure 1: Line card functional block diagram.

To achieve the performance required for packet processing, several vendors approach the problem by breaking the functions (shown in Figure 1) into:

  • Classification co-processor - assigns a packet to a flow.

  • Policing engine - ensures that a flow does not use more bandwidth than allocated in its SLA (service level agreement). It is typically performed at the edge of a QoS (quality of service) network and the non-compliant packets are either dropped or marked for later action.

  • Traffic manager - enforces SLAs for that flow. Generally, packets from different flows with varied SLAs are reordered and often dropped. Packets within a flow are never reordered. Traffic management, which includes traffic shaping, queuing and scheduling, is the most bandwidth-intensive and critical function in the network processing flow. Traffic shaping helps manage congestion and deal with the bursty nature of network traffic. Queuing and scheduling engines determine departure time and ordering of packets. They create hierarchical queues to aggregate flows into classes and classes into ports. Each level of hierarchy can use different queuing algorithms to prioritize the various flows. Typically, traffic managers are standalone chips that perform shaping, queuing and scheduling based on the set of governing policies determined by the classifier. They provide fine-grain QoS and maintain SLAs. An external processor may be required only to set-up or tear down flows, but not on a per packet/cell basis. Every system differs in traffic management protocols, memory management, payloads, interfaces, etc. Typical policing algorithms include leaky bucket, token bucket, etc. Congestion management algorithms include random early detect (RED), weighted RED (WRED), etc. Scheduling algorithms include priority queuing (PQ), fair queuing (FQ), weighted FQ (WFQ), etc.

  • Off-the-shelf NPUs (network processing units) rarely meet the performance requirements, and typical OC-48c traffic managers for packet over SONET/SDH require separate traffic managers in the ingress and egress paths or a full-duplex (5Gbps) traffic manager. They also rarely support all the required algorithms. Also, invest-ment in ASICs is cost prohibitive.





    Enabling traffic management and backplanes
    Traffic management demands high performance, flexibility and support for multiple queuing and scheduling algorithms and protocols, memory types and interfaces. Platform FPGA devices offer the following features that provide key advantages for traffic management:

  • High-speed interfacing
    - Up to 24 embedded MGTs (multi-gigabit transceivers) enable high-speed (up to 10.3125Gbps) with improved noise immunity, lower power, reduced signal count and reduced board complexity.
    - These devices also support 17 single-ended and 6 differential standards, required for schedulers using:
    · HSTL for high-speed interfacing to framers and memories
    · SSTL for interfacing to framers, memories and ASSPs
    · PECL for clock inputs/ outputs
    · LVDS/CML for blade or backplane communication
    · PCI for interfacing to CPU chipsets
    · LVCMOS/LVTTL for almost everything else
    - Provides a large number of package types and high IO pin count (maximum of 1200) for the throughput required for interfacing.
    - Every pin on the FPGA provides digitally-controlled impedance (DCI) for simplified board layout via elimination of hundreds or thousands of off-chip terminating resistors. This allows for fewer layers and shorter traces on the PCB, leading to higher system reliability.

  • DCMs (digital clock managers) and clock distribution trees - The traffic manager interfaces to several external devices and must handle multiple clock domains at different frequencies. DCMs compensate for signal skew due to clock distribution delays and board layout constraints. A DCM and clock-tree is typically utilized for each external high-speed interface. The 12 DCMs provide phase shifting and frequency synthesis, suited for systems with multiple clock domains and critical timing requirements. DCMs support over 400MHz clock outputs to enable leading-edge interfaces, such as RapidIO and SPI-4. Being digital, the DCM is impervious to system temperature and voltage variations. The DCM offers a zero-delay clock buffer with a precise 50/50 duty cycle generation. Precise phase control is within a 1 percent clock period accuracy, which is critical for setup and hold time alignment. It allows precise frequency generation from 24 to 420MHz.

  • BlockRAM - The over 10Mbit embedded BlockRAM is ideal to store frequently accessed objects, thus accelerating performance. The embedded memory enables a plethora of applications, such as memory cache, storage for statistics and scratch pad, storage of bitmaps for transmit schedules and buffer management operations, clock domain crossing, and elastic buffer for intra-chip communication.

  • Multipliers - Traffic managers require intense arithmetic operations for packet scheduling computation. Scheduling involves multiply operations between integers and floating point numbers (Tsi (t+) <= Tsi (t) + Lpkt/ri). Typical algorithms require 18-bit multiply operations at 100MHz performance. Platform FPGAs offer up to 556, 18318 multipliers per device, running over 300MHz. The multipliers and logic allow the design of custom hardware accelerator cores like encryption, check-sum calculations and DSP.

  • Large amount of high-performance programmable logic (up to 10 million gates) and routing resources - The scheduler performs a large number of complex operations at very high speed. Also, the operands are maintained in registers. Since each scheduling decision must be made in each cycle, deep pipelines are employed, which cause data hazards and hence serious inefficiencies. A large number of on-chip flip-flops are needed to satisfy these design objectives. The FPGA offers logic performance in excess of 300MHz, with a number of internal routing resources for the numerous, wide communication paths and for storing linked-lists.

  • PowerPC processor, Core-Connect and tools - Today's Platform FPGAs embed up to four, 300MHz (420 D-MIPS) IBM PowerPC cores to assist in functions such as statistics monitoring, control and exception handling. Solutions include the IBM CoreConnect bus for access to peripherals and a smooth hardware and software design environment through the System Generator for PowerPC, GNU compiler and software debugger tool chain, WindRiver VxWorks, etc. Debug tools such as ChipScope Pro are also available.

  • Conclusion
    Brute force alone cannot meet the design objectives of modern packet switching platforms. The new level of performance and features of Platform FPGAs provide a powerful platform for building revenue-generating routers and switches.

    Author information
    Amit Dhir is a Senior Manager in the strategic solutions marketing group at Xilinx. He has a BSEE from Purdue University, MSEE from San Jose State University, and is working on his MBA with the University of California at Berkeley's Haas School of Business. He may be reached at Dhir@xilinx.com.

     
    Free Print Subscription Printer-friendly version Email to a Friend
    Article Rating 
    Average Rate: No rating yet
     
    Poor Quite Good Good Very Good Excellent
     
     
    Related Content 
     
     
    WEBCASTS
     
    KNOWLEDGE CENTER
    Panasonic Key Devices Guide 2008:
     
    Fairchild Semiconductor :
     
     
    Highest Rated  
     
    Feedback Loop  
     
     
     
    ADVERTISEMENT
    Press Release 
     
    TECHNOLOGY NEWS
     
    RESOURCE CENTER


     
     
    PRODUCT NEWS
     
    FEATURED SPONSORS


     
     
     
    DESIGN CENTERS
     
    ADVERTISEMENT
         
    Reference Designs 
       
         
     
     
     

     
     
    RSS
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       

    POLL
    What type of environmental regulation do you think will be most beneficial for the tech industry?
    Proper recycling and disposal
    Push for power efficiency and energy conservation
    Chemical/lead regulation
    View results
     
    Outlook and Trends 2008