

Three years ago, DDR (double-data-rate) memories were nothing more than forecasted placeholders on marketers’ future-product road maps (
Reference 1). Now, promises have indeed turned into products, but the plethora of vendors and architectures is no less confusing than it was in late 1998. With dreams of dot.com success dancing in their heads, semiconductor suppliers have turned away from the evaporating PC- and workstation-cache market and turned toward networking. As a result, the diversity of alternatives may have, if anything, become even more complicated.
Competitors recognize that you value multivendor standardization and have banded together into two main groups: the QDR (quad-data-rate) SRAM Co-development Team and the SigmaRAM Consortium. Although the groups are exploiting similar techniques to boost memory performance, their implementations’ functional characteristics, read and write timings, packaging, and pinouts are regrettably incompatible (Table 1). Architecture diversity within each group necessitates that you undertake strict system analysis to determine which of the numerous product-matrix options is optimum for each of your system’s memory applications (Figure 1).
Will each memory subsystem see a |  Figure 1 The SigmaRAM Consortium’s diverse product matrix gives you many options to consider (courtesy SigmaRAM Consortium). |
balanced ratio of reads and writes or a dominant proportion of one or the other (
Table 2)? What’s the key performance bottleneck you’re trying to surmount: the speed at which you can get data into and out of the memory array or the rate at which you can transition addresses and control inputs? Are you willing to tolerate a higher pin-count memory and corresponding controller in exchange for potentially higher performance? How much density, bus width, and functional flexibility do you require within a single package and pinout, and are you willing to standardize on a larger package to secure this flexibility? Do you need to be able to concurrently read and write the same memory and, potentially, even the same location within the memory?
QUAD CAME FIRSTLate-write SRAMs, developed for multitasking servers, reduced conventional synchronous SRAMs’ two-clock “dead cycle” read-to-write latency to one clock. No-latency SRAMs, as their name implies, eliminate dead cycles. These architectures, known by such marketing monikers as Double Late Write, Dual Late Write, Late Late Write, NoBL (no bus latency), No-Wait, NtRAM (no-turnaround RAM), ZBT (zero bus turnaround), ZeroSB (zero synchronous burst), and Zero Turnaround, focus their attention on the internal memory array, register pipeline, and state machine. Because they provide only a single bidirectional data bus, they can’t avoid the performance and reliability limitations the bus’ slow-to-float outputs and fast-to-drive inputs create, as well as the silicon cost associated with the complex I/O buffer circuitry. You also can’t simultaneously read from and write to the same chip.
QDR SRAMs deal with the performance bottlenecks of bidirectional buffers by splitting out the input and output data buses on separate sets of pins, using a DDR protocol. By providing matching dual buses on your memory controller, you can eliminate bus contention there, too. The cost trade-off of dual unidirectional buses versus a single bidirectional bus is unclear; although the unidirectional buffers are less complex, there are twice as many of them and twice as much signal routing between the array and the outside world. Any added dual-bus complexity is proportionally less of an issue at higher densities, where the larger array dimensions define a higher percentage of the overall die size.
 Figure 2 Burst-of-two memories (a) are ideal for highly random accesses but run at lower peak clock rates than burst-of-four alternatives (b), and they also have more complicated address cycles (courtesy QDR Co-development Team). |
Like their late-write and no-latency predecessors, QDR SRAMs retain a single address bus for both reads and writes. You can therefore consider them an interim step between conventional single-data-bus RAMs and fuller featured, much more expensive, multiport RAMs. QDR’s dual-data/single-address discrepancy leads to the burst-of-two and burst-of-four product options (
Figure 2). You cannot prematurely terminate a burst-of-four cycle, which may be too constricting for some access patterns, but you need to feed the part with a new address only on each positive clock’s rising edge. Burst-of-two memories, as their name implies, have shorter required read or write burst cycles, but you feed them addresses on both edges of the positive input clock. Performance limitations of your memory controller’s address generator may result in a system-level bottleneck. Burst-of-four memories also operate at higher clock speeds than their burst-of-two equivalents.
You supply a minimum of two clocks to a QDR memory: source clocks K and K#. Ideally, K and K# are exactly 180° out-of-phase, though only the relative position of their rising edges is important for proper device function. Optionally, you can also route separate data clocks C and C# to the memory to synchronize the outputs. If you don’t supply C and C#, the memory’s output buffers automatically use K and K#. QDR SRAMs’ HSTL (high-speed-terminated-logic) outputs run at 1.5 or 1.8V, and you can configure their drive strength using a single resistor, connected to the ZQ input, whose value is five times the desired output impedance. The chips also require a 2.5V supply for the memory core.
Initial QDR team members Cypress Semiconduc-tor, Integrated Device Technology, and Micron Technology unveiled their partnership in July 1999 and published the architecture specifications the following February, coincident with Cypress’ making its 9-Mbit-device available for sampling. NEC joined the consortium in January 2001. In April 2001, Samsung (depending on whom you’re talking to) either defected from or broadened its participation beyond the SigmaRAM consortium. Hitachi joined the team in August 2001. Analysts estimate that these six vendors supply roughly two-thirds of the worldwide unit shipments of synchronous SRAMs.
SIGMARAM RAISES THE BARWhat about some of the other fastSRAM suppliers, such as GSI Technology, IBM, Integrated Silicon Solution, Mitsubishi, Motorola, Sony, and Toshiba? These suppliers and, at least initially, Samsung were put off by the exclusionary nature of QDR-patent licensing and specification development and believed that QDR had some significant technical limitations, so they formed the SigmaRAM Consortium in July 1999. Although IBM and Motorola have subsequently scaled back their participation in the SRAM business, Alliance Semiconductor joined SigmaRAM last July. (QDR member IDT has also scaled back its participation in the SRAM business.) JEDEC approved SigmaRAM’s specifications for standardization in late 2000, although by the end of 2001, no vendors had announced that they were shipping dual-data-bus SigmaRAMs.
Whereas QDR SRAM standardized on a 165-bump BGA package, SigmaRAM chose a 209-bump BGA approach. Both consortiums’ preferred packages have a 1-mm bump-to-bump pitch, translating to easier signal routing and minimal via requirements. SigmaRAM also specifies a smaller-footprint, 221-bump BGA with 0.8-mm pitch for board-space-constrained applications. SigmaRAM’s higher pin-count packages provide the memories with additional power and ground inputs and with enhanced architecture flexibility. Specifically, SigmaRAM offers optional positive and negative echo clocks, synchronized to transition no longer than 100 ps ahead of valid output data.
Acting as “data-valid” strobes, echo clocks enable reliable high-speed operation. Conversely, echo clocks complicate the memory-controller design in a multimemory array, because the controller has to treat each memory in a unique fashion, depending on when its echo clock arrives. You also need to be aware that an incoming echo clock transition coincident with a memory-controller clock edge can cause metastability problems.
SigmaRAM supports both SDR (single-data-rate) and DDR data interfaces and runs at a 1.8V core voltage. Like QDR, it employs the HSTL protocol and offers programmable output impedance. SigmaRAM members also tout the standard’s upgradability to 144-Mbit densities. However, this hotly debated claim depends on several variables, and it’s unclear when or if such large discrete SRAMs will be in great demand (see sidebar “Other options”). One factor in a package’s density capability is whether it offers enough pins for the required number of address and other density-related inputs. Another factor is whether the package cavity has enough room to fit the required die size.
A 14×22×2.2-mm, 209-bump SigmaRAM BGA is larger than a 13×15×1.2-mm QDR BGA. But the bigger-is-better argument holds water only if the QDR and SigmaRAM chips employ the same manufacturing lithography and have similar features, which leads to comparable die sizes. If the QDR chip comes from a manufacturing process that is one or two generations more advanced than its SigmaRAM competitor, all bets based on package cavity dimensions are off. SigmaRAM and QDR tout their “clamshell” mirror-image pinouts that enable easy signal routing for parts mounted on both sides of a pc board, as well as their multichip connections for depth and width expansion.
In June 2001, the QDR Co-development Team announced the concept of second-generation QDR and unveiled specifications in October 2001, coincident with 18-Mbit sample shipments from Micron. The so-called QDRII sticks with the 165-bump BGA package but adds SigmaRAM-like echo clocks (but not an SDR option). QDRII proponents also included an on-chip DLL (digital-delay locked loop) that increases the data-valid time to 65% of the access cycle. Although QDRII parts migrate from a two to a two-and-one-half-cycle pipeline, the second-generation parts’ higher performance potential counterbalances this added initial latency. QDRII also migrates to a 1.8V core operating voltage.
DO A DOUBLE TAKEThe QDR and SigmaRAM groups aren’t focusing their efforts solely on dual-independent-bus memories. The QDR Co-development Team will outfit its DDRII parts, also in 65-bump BGA packages, with features similar to the team’s QDRII counterparts, including echo clocks, DLLs, and two-and-one-half-stage pipelines. As an intermediary step between single-bus and dual-independent-bus extremes, QDR advocates also plan to offer separate-bus DDR memories. Such memories will be conceptually similar to QDR devices and run at higher clock speeds than QDR, but they will be unable to accept simultaneous read and write operations.
The SigmaRAM Consortium also specifies common-data-bus parts in both SDR and DDR variants, and specified SDR in both late-write and no-latency versions. Company schedules indicate that SigmaRAM supporters have focused their near-term energies on common-I/O memories. Both Samsung and Toshiba have shipped samples of SigmaRAM common-I/O devices; Toshiba unveiled the 18-Mbit parts last November. SigmaRAM SDR common-I/O memories’ data bus is 18 to 72 bits wide, which partially explains SigmaRAM’s higher number of power and ground inputs. The 18- to 72-bit SDR bus widths deliver comparable peak bandwidth to DDR alternatives with 9- to 36-bit data buses.
SRAM suppliers face an interesting balancing act in attempting to support all possible QDR- and SigmaRAM-product options. One school of thought says to put as much interface flexibility as possible into each chip design and to configure the chips to a specific feature set via custom metal masks at the end of the manufacturing line or through fuse programming during test. This approach obtains maximum return out of each chip-design team’s effort investment, and it also simplifies the fabrication process by reducing the number of mask sets. On the other hand, the extra options might excessively complicate the design, delaying time to market and degrading performance. They might also disproportionally increase the die size, especially with ultrawide-data-bus versions, and increase the test time, consequently adding to the cost of all device options.
 Figure 3 Theoretical bandwidth calculations are of some benefit when evaluating memory alternatives but fail to factor in performance bottlenecks and boosters elsewhere in the system (courtesy QDR Co-development Team). |
The alternative approach, multiple custom designs, offers an inverse set of strengths and shortcomings. Vendors will undoubtedly make different decisions about which parts to make on common versus separate silicon foundations. GSI Technology, for example, believes that it can make all six common-I/O SigmaRAMs based on just two die, with two more die dedicated to separate I/O SRAMs. Note too that all members of a consortium need not manufacture all product versions that the consortium’s specifications represent.
How do you decide which vendor’s architectures and parts meet your needs? Theoretical graphs are of some value, although they don’t account for bottlenecks elsewhere in the system that might leave untapped some of an expensive memory’s performance potential. Conversely, theoretical calculations also fail to factor in caches and other schemes that might boost an inexpensive memory’s perceived performance (
Figure 3).
Equally, or even more beneficial, if you have the facilities to use them, are the functional, timing, and electrical models available from the chip suppliers and from independent companies, such as Denali. For example, Micron provides BSLD, Denali, Ibis, Verilog, and VHDL models on its Web site, along with placeholders for HSpice. The QDR SRAM Web site contains links to models from Cypress and NEC.
Other optionsRarely, if ever, does a single memory architecture ideally address all memory needs within a given application, far from working optimally across multiple applications. Look, for example, at networking equipment; the look-up-table, link-list, and packet-buffer memories have different read-versus-write access profiles, random-versus-sequential address patterns, and burst-access lengths (
Figure A).
After listening to a QDR or a SigmaRAM marketing pitch, you might think that between the single- and dual-data-bus parts, your needs are covered. Before jumping to conclusions, though, you should consider a few other options, especially if cost per bit is as critical as sustained bandwidth per pin. RDRAMs (Rambus DRAMs), conventional DDR SDRAMs, and Kentron’s multichip Quad Band Memory exploit wide internal data buses and advanced I/O protocols to boost peak transfer speeds. But with their cost-optimized memory arrays, these PC main-memory-focused chips do little to improve and may worsen random-access speeds compared with commodity SDR SDRAMs (
Reference A).Multiplexed addresses and other interface overhead are factors in nonsequential access times that exceed 50 ns, even with memories made on today’s advanced lithographies.
 Figure A Different memory uses require different memory characteristics (courtesy QDR Co-development Team). | However, take a look at the diverse set of fast-random-access DRAMs now emerging as memory manufacturers strive to expand beyond the unpredictable PC market. Higher random-access speeds might come from fine-grained array partitioning, such as FCRAM from Fujitsu and Toshiba, RLDRAM from Infineon and Micron, and 1T-SRAM from MoSys; on-chip caches, such as Mitsubishi’s 3D-RAM, NEC’s Virtual Channel Memory, and Ramtron’s EDRAM; or a combination of these techniques. Assuming that all other factors are equal, reads from a DRAM’s passive capacitor will always be slower than those from an SRAM’s active transistor. On the other hand, even with the die overhead you need for fast-access circuitry, a one-transistor, one-capacitor cell DRAM, especially at high densities, has a significantly smaller die than a six-transistor cell SRAM.
Some of these fast DRAMs retain multiplexed address buses that reduce pin count but increase random-access overhead. Other DRAMs employ an SRAM-like demultiplexed address bus. In this case, closely examine the specifications to determine |
whether your memory controller is responsible for refreshing the memory, and, therefore, for ensuring that access requests don’t collide with in-progress refresh. Some SRAM-replacement DRAMs offer autorefresh and avoid collisions, or at least claim to, through internal caching of locations that are refreshing.
Competition and cooperation in the semiconductor business sometimes produce strange bedfellows. For example, take the case of Ramtron’s DDR and NoBL ESRAMs, which the company’s Enhanced Memories subsidiary markets. Cypress Semiconductor is Ramtron’s NoBL ESRAM-development and marketing partner. Infineon, another Ramtron investor, acts as the ESRAM foundry. Yet, Infineon has partnered with Micron to co-develop an ESRAM competitor, RLDRAM (reduced-latency DRAM). Further complicating the situation, Cypress and Micron are partners on QDR SRAM co-development.
If off-chip DRAM is too slow, you can include an embedded DRAM array alongside your logic circuits. Lower-than-anticipated technology investments from foundries and equipment suppliers and the unprofitably low prices of discrete memories in the past few years have slowed this option’s long-predicted transition from concept to widespread reality. Yet embedded memories’ inherent advantages remain valid. Some of these advantages include ultrawide data-transfer buses, flexible array sizes and orientations, and low power and reduced EMI emissions due to the lack of chip-to-chip interconnect (
Reference B). According to chip suppliers, such as STMicroelectronics, hard-disk-drive and ink-jet-printer controllers use embedded DRAM because they can no longer cost-effectively buy the small memories that these and similar applications need. Embedded DRAM is fast; you need only look at MoSys’ 1T-SRAM arrays inside the Nintendo GameCube’s Flipper chip, which deliver less-than-10-ns random-access speeds (
Reference C).
Sometimes, though, only the fastest memories will do. In such cases, you need to turn to embedded SRAM. Unlike with embedded DRAM, embedded SRAM is fully compatible with standard logic processes. As ASICs become increasingly dense at less-than-0.18-micron lithographies, embedded memories will become increasingly attractive as a means of filling the die area that the I/O-bond-pad ring defines. And if you’ve been following the programmable-logic market over the past few years, you probably can’t help but notice the burgeoning amount of on-chip SRAM in today’s FPGAs that companies such as Altera and Xilinx offer; Cypress also offers embedded SRAM in its CPLDs. Embedded RAM, currently used to construct special-purpose content-addressable memory, FIFO, and multiport-RAM structures, will undoubtedly encompass an increasing percentage of the total system memory in the future.
REFERENCESA. Dipert, Brian, “SRAMs strive to specialize,” EDN, Nov 5, 1998, pg 62.
B. Dipert, Brian, “The slammin’, jammin’ DRAM scramble,” EDN, Jan 20, 2000, pg 68.
C. Dipert, Brian, “Embedded memory: the all-purpose core,” EDN, March 13, 1998, pg 34.
REFERENCES1. Dipert, Brian, “Cutting-edge consoles target the television,” EDN, Dec 20, 2001, pg 47.
ACKNOWLEDGMENTSSpecial thanks to GSI Technology, Integrated Silicon Solution, and Ramtron for their in-depth information on high-speed SRAM applications and to Xilinx for its perspective on discrete-and embedded-SRAM trends.

You can contact Technical Editor Brian Dipert at
(1) 916-454-5242, Fax (1) 916-454-5101
E-mail
bdipert@pacbell.net.