Free Print Subscription Printer-friendly version Email to a Friend

Thermal integrity: A must for low-power- IC digital design

( 01 Dec 2005 )
by Michael Santarini, Senior Editor



Over the last three years, IC-power management has moved from a thirdorder to a first-order concern for chip designers, especially those designing ASICs and SOCs (systems on chips) for portable-system applications. Accordingly, many power tools made their debut at this year。ッs Design Automation Conference, which took place in Anaheim, CA, in June. Experts say that, to get a true grasp of transistor leakage—an ever-larger consumer of system power—you must first get a read on the thermal effects of your design and how they impact the timing and reliability of digital ICs. Experts assert that, if you get an accurate account of heat on your chip, you can maximize yourdesign for the right mix of power, performance, and reliability.

If you。ッre designing at process geometries of 90 or even 130nm, you know that IC-power management is a big problem. Several EDA companies have developed tools to estimate active power, which is power a system consumes through normal operation and computing. Some vendors have also developed tools that try to account for leakage power, which leaks from transistors when systems are in standby mode. Leakage was a problem at the 130nm node and has become a huge problem as designs moved into 90 and 65nm. Experts say that designers cannot account for leakage and, thus, IC-power consumption without accurate thermal analysis.

。ーAs temperature increases, leakage increases exponentially,。ア says Andrew Yang, president and chief executive officer of Apache Design Solutions. 。ーTSMC (Taiwan Semiconductor Manufacturing Co) projects that leakage consumes about 50% of the total power. We。ッve asked our customers implementing designs on 90nm silicon, and they are seeing that leakage consumes 25 to 40% of power. In moving to 65nm, we expect 50 to 70% of total power will be lost through leakage.。ア A lot of that leakage results from inaccurate estimates of temperature, and most of that inaccuracy is due to the use of outdated maximum-temperature limitations and models passed to package and systems designers. Rajit Chandra, president and chief executive officer of Gradient Design Automation, says that temperature has always to some extent been a factor in IC design, but, for the most part, designers targeting thermal tolerance have based their work on the fact that the temperature on the IC should not exceed 105。ニC. For more than a decade, that rule has dominated. But, as designs target finer process geometries and designers place more functions on a chip, designing to 105。ニC across the chip is not the most effective route for hitting performance goals, and vendors are working toward a lower maximum temperature.



Transmeta is one such company. It offers the LongRun2 low-power-design methodology to fabs. The company。ッs founder and chief technology officer, Dave Ditzel, a noted processor designer, says that IC-design groups today often make power-versusperformance trade-offs. 。ーBecause leakage is such a big issue, people who used to spec parts at 105。ニC are targeting lower maximum temperatures,。ア he says. 。ーIf you look at a typical desktop CPU, it will likely be rated for only 85。ニC. To control leakage, people would like to even further reduce that temperature.。ア Library vendors and fabs all offer low-power processes, low-thermal-voltage transistors, and multithreshold CMOS, but Ditzel says users give up clock speed when going to those structures.


<%@ LANGUAGE="VBSCRIPT" %>
<% Randomize: ord=int(rnd*1000000000) %>




Chandra says that many of today。ッs SOCs are so large and perform so many functions that areas of the die develop microclimates and hot spots instead of maintaining a constant and predictable temperature across the die (Figure 1). 。ーReality starts diverging from traditional assumptions regarding temperature in ICs,。ア says Chandra. 。ーFor example, if you assume your chip to be 25。ニC and it is actually 35。ニC in a spot on the chip where there are lowthermal- voltage transistors that are leaking, then the current will go up by 50%. With the next 10。ニC rise in temperature, the current will go up 126%.。ア If you take it a step further and assume that the chip runs at 25。ニC and it is actually running at 45。ニC, then, according to Chandra, 。ーYou will be way off on power and even timing on a bunch of transistors.。ア Add ambient heat from the climate to the mix, and the problem gets worse. 。ーUnlike the old days, when you asked what the average power was, and temperature was just an afterthought, temperature is now driving the power,。ア he says.

Yang and Chandra say that popular low-power-management techniques of clock gating and power gating, shutting down parts of the design when not in use, can be the main culprits in creating localized hot spots or microclimates. 。ーTo reduce power, people use clock gating, essentially shutting down certain clockdomains, reducing certain parts andactivity of the design and cooling them,。ア says Yang. 。ーIn fine-grained clock gating, we see a temperature variation on chip as opposed to having everything toggle in a statistical way. Power gating through the use of multithreshold CMOS to deliberately shut down parts also causes on-chip temperature variation.。ア He says that everything on an IC should simultaneously heat up and cool down, but that situation doesn。ッt occur in fine-grained devices. 。ーHaving portions of a design heat up while other portions of the design cool down can cause race conditions in extreme cases or hold-time and matching violations, similar to what analog designers have dealt with for years,。ア says Yang. He predicts that thermal impact on low-power design will become more important as chip designers start to use clock- and power-gating design techniques.

On the other hand, Transmeta。ッs Ditzel thinks that microclimates are insignificant problems in designs using a 90nm process. 。ーYou get variation as dies get bigger, but that evens out quickly over time,。ア he says.

Thermal impact
When performing thermal analysis, designers have to also consider power leakage, on-chip temperature, reliability, electromigration, and IR drop. 。ーClock timing is sensitive to temperature variation,。ア says Yang. 。ーEvery 15。ニC increase locally causes a delay or slew to increase roughly 10 to 15% locally. So, a temperature increase does slow things down.。ア Electromigration also increases exponentially with increases in temperature. 。ーTypically, a chip has a maximum tolerance of 105。ニC, but, if the local area heats up 15。ニC, the chance of electromigration increases exponentially, reducing the lifetime of a device by four times,。ア he says. He notes, too, that resistance scales linearly with temperature, so, for a 10 to 15。ニC increase in temperature, resistance of the local area increases by 10%. As a result, IR drop increases by about 10%, according to Yang.

Traditionally, vendors have relegated the job of cooling ICs to package and system designers. But experts argue that the packagedesign step occurs too late in the process and that package designers typically have insufficientl accurate thermal estimates to build the optimum package for a part. 。ーPackage designers have traditionally been given just one temperature to work with,。ア says Chandra. 。ーThe package designers think they can completely cool off the entire chip, but that assumption is incorrect. The energy devices, like the leaking devices and critical paths, continue to generate heat. Package design will reduce some— but not all—of the temperature.。ア



Designing a package with too low a thermal tolerance can be costly because it can lead to overheating and failure of the chips, whereas excessive use of guardbands; extravagant packages; and extravagant cooling schemes, such as employing on-package fans, can add expense to projects. In the 90nm era, package designers need more accurate power and thermaldata from IC designers. Inaccurate power analysis can lead to power variation across a wafer. 。ーAs you go to higher performance devices, you find that not all parts leak the same,。ア says Ditzel. He points out that the die on a wafer have varying degrees of performance and power consumption.

The power grade and, thus, heat vary by as much as a factor of fouracross a wafer. 。ーFaster parts are those with lower threshold voltages. They also are the parts with the highest leakage,。ア Ditzel says. 。ーIn that case, your power estimate, expressed in CV2f (total switched capacitance times the power-supply voltage squared times the switching frequency), is the same, but you have a huge variation in leakage,。ア he says. 。ーIt has to do with statistical variation in processing: The fab can control the threshold voltage only so well, so it has some variations. That variation can greatly affect your design.。ア This variation may be acceptable for microprocessor vendors that have many products in various speed and power versions. However, SOC and ASIC vendors having to meet certain performance and power specs cannot tolerate that variation. For ASIC designers, the device either meets performance requirements or ends up in the trash. The variation across the wafer also leads to a problem for package designers if they receive a spec with a lower-than-needed maximumtemperature specification.

Chandra proposes eliminating the wall between the chip and package designers. 。ーBecause of the temperature issues and the regional effect of heat, leakage has in many ways caused people to think about packaging, heat sinks, and thermal gradients within the chip,。ア he says. 。ーThey all need to come together to better serve chip cost-effectiveness and chip quality.。ア He notes that companies such as Ansoft and Flomerics offer tools that analyze electromechanical issues and junction and ambient temperature. 。ーThose vendors are traditionally not concerned about how power changes with temperature and how it distributes itself within a chip,。ア Chandra says, noting that vendors in that area may soon work more closely with EDA companies.


Thermal-integrity analysis
Yang and Chandra predict that thermal analysis will become a hot area in software tools and thatdesigners must account for thermal effects if they want to home in on the right mix of leakage, performance, reliability, and package. 。ーFirst, there was signal integrity; then, there was power integrity; now, the next analysis wewill need to do is thermal integrity,。ア says Yang. His company, Apache Design, which beat by three years most of the flock of EDA vendors into power-integrity analysis, currently offers no thermal-analysis tools but is considering adding them. Yang envisions that a comprehensive thermal-integrity flow will evolve in which users first perform active- and leakage-power estimations and then perform thermal simulation. 。ーThermal simulation will take in the package and process parameters—for example, the thermal resistance of the package, the heat sink, and the substrate information—and simulate the thermal-resistance and -capacitance networks to derive a steady-state temperature,。ア he says. 。ーThe key is the feedback between power and thermal simulation. If you are not careful, you can get a thermal-runaway condition.。ア

After thermal simulation, users will develop an on-chip thermal profile to analyze the power and thermal impact on chip timing, reliability, IR drop, dynamic-voltage drop, and ground bounce. 。ーWe can。ッt analyze those conditions using worstcase, steady-state temperature throughout the whole design,。ア says Yang. 。ーThey cannot consider the variation effects.。ア He says that, after analysis, users will give the information to package designers and apply the information to IC layouts. 。ーUsers will move blocks around to avoid localized temperature buildup or, in some cases, widen wires to reduce current density and self-heating,。ア says Yang.

Gradient, which introduced its FireBolt thermal-analysis tool at DAC, has a trademark on the term 。ーthermal integrity.。ア Gradient。ッs standalone tool performs an on-chip thermal analysis and locates hot spots. The company also has patented on-chip heat-sink structures that designers can add to the metal layers in instances in which conventional layout fixes don。ッt work (Table 1). According to the company, the thin substrate of silicon-on-insulator and strained silicon is highly susceptible to onchip thermal variation. Low-k dielectrics have poor thermal conductivity that traps heat in wires (Figure 2).



Other than Gradient, Magma is the only other tool vendor now offering customers digital-ICthermal- analysis tools, which it has integrated in its Blast-Rail and Blast- Rail 5.0 products. The tools use a scalable polynomial-leakage model to get an accurate reading of onchip temperature variation. Users identify hot spots at full-chip level and perform more thorough analysis after the tools identify hot spots. 。ーIn the early stages of the f low, you don。ッt want to characterize each cell, but, as the design progresses, you want to characterize critical cells, such as those in the critical path,。ア says Amir Ajami, consulting staff member at Magma. 。ーThose paths get the most use, and that means they generate the most heat. They also have the biggest impact on the general distribution of the overall clock signals.。ア

Fixing thermals
Designers can employ a number of techniques to correct thermal-related problems. 。ーOne trick is simply to move design blocks around and equalize temperature across the chip during f loorplanning,。ア says Chandra. Designers can also make wires more robust and reduce leakage by widening traces and using low-thermal-voltage transistors. However, this approach slows performance and increases area. The method is an effective way to reduce leakage and prevent electromigration. To fix problems with nonuniform power consumption and to get a more constant temperature, users can adjust buffer sizing and placements.



Academic researchers are testing the use of dummy vias in the higher metal layers to reduce nonuniform temperatures on interconnect without impacting resistance and capacitance. 。ーIf you insert dummy vias at certain locations along the global line, you reduce the distance between the interconnect and the substrate, reducing the length and cooling the line faster because the substrate will work like a heat sink,。ア said Ajami. 。ーIt reduces the impact of the gradient on the interconnect.。ア

Transmeta also claims that its LongRun2 helps ease the powermanagement problem. 。ーIt makes it easy to control the threshold voltages of devices,。ア says Ditzel. The company offers a tool kit that helps them to implement dynamic threshold-voltage control to get the right level of performance when systems are running and in standby mode.

Author information
You can reach Senior Editor Michael Santarini
at 1-408-345-4424 and michael.santarini@reedbusiness.com

 
Free Print Subscription Printer-friendly version Email to a Friend
Article Rating 
Average Rate: No rating yet
 
Poor Quite Good Good Very Good Excellent
 
 
Related Content 
 
MCU/MPU Finder Powered by RENESAS
 
WEBCASTS
 
KNOWLEDGE CENTER
Panasonic Key Devices Guide 2008:
 
Fairchild Semiconductor :
 
 
Highest Rated  
 
Feedback Loop  
 
 
 
ADVERTISEMENT
Press Release 
 
TECHNOLOGY NEWS
 
RESOURCE CENTER


 
 
PRODUCT NEWS
 
FEATURED SPONSORS


 
 
 
DESIGN CENTERS
 
ADVERTISEMENT
     
Reference Designs 
   
     
 
 
 

 
 
RSS
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

POLL
How do you expect your company to perform this year?
Worse than last year
Same as last year
Better than last year
View results
 
Outlook and Trends 2008