Hardware design considerations for space-grade DDR4

Article By : Rajan Bedi

DDR4 will allow the satellite industry to offer higher-throughput on-board processing and increased acquisition times.

Previously I introduced DDR4 for space applications (see “Fast DDR4 SDRAM to enable the new space age”) offering 4 GB of volatile storage at a clock frequency up to 1.2 GHz and a data rate of 2.4 GT/s (bandwidth of 172.8 Gb/s). Compared to previous generations of SDRAM, DDR4 contains new architectural and hardware features that improve capacity, performance, scalability, system-level reliability, and power efficiency. In this post, I introduce these devices and discuss timing and signal-integrity considerations and the connectivity of this memory with FPGAs to ensure your avionics design is right first time.

An SDRAM architecture comprises memory cells organised in a two-dimensional array of rows and columns as illustrated in Figure 1. To select a particular bit, it is first necessary to address the required row and then the specific column. Once the desired row is open, it is possible to access multiple columns, and hence improve speed and reduce latency through successive read/write bursts.

To increase word size, the memory has multiple arrays, which means that when a read/write access is requested, the memory only requires one address to access one bit from each array.

To increase overall memory capacity, banks are added to the internal structure of SDRAM as shown below. Bank interleaving further increases performance and each can be addressed individually.

SDRAM bit cells and the organisation of a DDR chip.

Figure 1 SDRAM bit cells and the organisation of a DDR chip.

The core speed of SDRAM is slower than its I/O rate, and multiple words of data are accessed during every column command, which are then serialised to/from the interface. DDR4 is based on an 8n-prefetch architecture, which transfers two n-bit wide data words per clock cycle at the I/O. A read or write operation comprises a single 8n-bit-wide, four-cycle burst transfer at the internal DRAM core and eight corresponding n-bit, one-half clock cycle transfers at the I/O pins.

DDR4 extends the above SDRAM architecture by introducing bank groups, allowing a prefetch of eight in one group, and a second to be executed independently in another. Effectively, DDR4 time-division multiplexes its internal bank groups to hide the fact that the internal core requires more time than that required by a burst of eight words at the I/O interface. Compared to DDR3, DDR4 improves performance by offering more banks with significantly smaller row sizes, meaning devices can cycle through different banks at a higher rate. The organisation of DDR4 memory is illustrated below (Figure 2): to support higher storage capacities without adding extra address pins, DDR4 uses a newly-defined ACT_n input to multiplex addresses on the command pins, RAS, CAS, and WE. If ACT_n is low, these inputs are used as address A16, A15, and A14 pins respectively. When ACT_n is high, they resume their normal functions as specified in the SDRAM command truth table.

DDR4 bank groups.

Figure 2 DDR4 bank groups.

Teledyne e2v’s 4 GB, rad-tolerant DDR4T04G72 is an MCP containing five die, four of which offer 1GB (8 Gb) of storage each, 512 Mb x 16 bits, organised in two groups with four banks in each as shown above. To bolster reliability, a 72-bit data bus is created comprising 64 data and 8 bits for error detection and correction. This ECC function is realised within the fifth die. The device uses an internal 8n-prefetch buffer to maximise high-speed operation and offers programmable read, write, and additive latencies.

DDR4 has introduced a number of hardware features to reduce power consumption: first, the I/O supply (VDDQ) has been reduced to 1.2 V from the 1.35 V rail used by DDR3. A separate 2V5 voltage, Vpp, has been added to activate the internal word line, lowering power dissipation by 10%. The I/O electrical interface for the data bus has changed from push-pull, stub series-terminated logic (SSTL) to pseudo open drain (POD) signalling as illustrated below (Figure 3). By terminating to VDDQ instead of 1/2 of VDDQ, the amplitude and centre of the signal swing can be tailored to each design’s need. POD I/O reduces switching current when driving data since only 0’s consume power. DDR4 also offers data bus inversion, which assigns fewer bits low, dissipating less power. Reduced switching results in less noise and a cleaner data eye.

DDR3 push-pull I/O signalling vs DDR4 POD.

Figure 3 DDR3 push-pull I/O signaling (left) vs. DDR4 POD (right).

Collectively, the reduced VDDQ voltage, the use of an external Vpp supply to boost the word line, the change to POD signalling, and VDDQ termination, as well as the previously discussed smaller row size with lower activation currents, have reduced overall power consumption compared to DDR3 SDRAM. At similar data rates, a DDR4 device has a 30% advantage in power efficiency. This improvement can be used to operate the SDRAM device at higher speeds or lower dissipation for the same performance. A power-prediction spreadsheet and ICEPAK/ECXML thermal models are available for the DDR4T04G72.

At a system level, DDR4 offers improved reliability, availability, and serviceability (RAS). Real-time CRC error detection of the data bus during write operations as well as parity checking of the command and address buses are available as shown below (Figure 4). Unlike DDR3, DDR4 can be configured to block commands upon detection of a parity error.

System-level error detection offered by DDR4.

Figure 4 System-level error detection offered by DDR4.

For those soft errors that cannot be fixed using ECC during the lifetime of the memory, DDR4 offers a post-package repair function to correct rows that have become faulty. Not only does this increase the reliability and longevity of systems, but it also provides a further mechanism to protect against single-event upsets.

DDR4 also offers a connectivity test mode (CT) to check the continuity of the PCB traces between the memory and the controller without invoking the SDRAM’s initialisation sequence. Unlike conventional boundary-scan testing where test patterns are shifted in and out of devices serially during each clock, CT mode uses a faster, parallel interface.

The DDR4 I/O interface is a true source-synchronous design, where the data is captured twice per clock cycle using a bidirectional data strobe, DQS. During a READ operation, DQS is output by the memory, co-incident with the data; and for WRITEs, the strobe is provided by the controller centred with respect to the data, providing a synchronous reference. To improve signal integrity as data transfer rates increase and amplitudes decrease, the clock and strobe signals are differential to cancel out common-mode noise. At a PCB level, DQS has identical loading to the data bus and should be routed similarly. The other address, command, control, and data signals still operate in single-ended mode, which makes them more susceptible to noise, crosstalk, and interference.

Prior to PCB layout, it is important to decide how much of the available timing budget to allocate to routing mismatch. This can be determined by thinking in terms of time or as a percentage of the overall period—e.g., with a clock frequency of 1.2 GHz, the period is 833 ps. Typical flight time for FR4 is 6.6 ps/mm, so length matching traces to 1 mm consumes around 1.6% of the total period for track tuning. If your design does not push the performance limits, you can allocate a larger percentage of the overall timing budget to length mismatch to provide more routing flexibility and ease the layout effort.

When calculating PCB propagation delays, note that these vary for inner (stripline) and outer (microstrip) layers because their effective dielectric constants are different. Vias represent additional length in the Z direction, and the number of vias in matched lines should be the same with identical span to ignore their impact on the overall timing budget.

Before PCB fabrication, post-layout simulation is recommended to confirm timing margins and signal integrity. IBIS and Spice models are available for the DDR4T04G72 to allow you to confirm electrical and timing compliance early in the design cycle. I use Mentor Graphics’ (now Siemens) HyperLynx LineSim and BoardSim to verify pre- and post-layout signal integrity respectively, to optimise termination and drive strength, and to validate timing margins to allow sign-off before manufacture.  An EBD model is currently being developed.

To verify the signal integrity between the controller and the memory using an internal routing layer, Figure 5 illustrates the eye diagram predicted by LineSim of a PolarFire rad-tolerant FPGA connected to a data line of a single DDR4T04G72. Multiple DDR4 devices can also be connected to a single FPGA each with its own IP controller.

Point-to-point connection between the PolarFire FPGA and the DDR4T04G72.

Figure 5 Point-to-point connection between the PolarFire FPGA and the DDR4T04G72.

To increase overall storage capacity, the same soft IP can also command multiple DDR4 devices placed in either fly-by or clamshell topologies, i.e., common clock, address, control, and data signals, with each SDRAM having its own chip-select input as illustrated below (Figure 6). In this case, the transmission lines are longer and the capacitive load higher, so simulation is necessary to confirm the required driver’s current strength. Each KU060 DDR4 controller has a maximum data bus width of 80 bits, can access up to four external memories (dependant on electrical loading), and the FPGA can instantiate two of these IPs. Nominally, the KU060 allows two external memory chips or one DIMM to be connected to each IP, with the latter containing four devices. To increase storage capacity beyond 8 GB (two DDR4 chips per IP), you could consider configuring the KU060 using LRDIMM mode (rank 4), but verify the electrical loading using signal-integrity simulations.

Connection of multiple DDR4 devices to a Xilinx KU060 FPGA.

Figure 6 Connection of multiple DDR4 devices to a Xilinx KU060 FPGA.

Xilinx offers a video that demonstrates how to instantiate a DDR controller (see “Configuring the DDR Controller in a Zynq UltraScale+ MPSoC”) as well as resources to calculate the maximum rate and the number of external SDRAM devices that can be connected to their FPGAs. PolarFire’s DDR4 IP offers a data-bus width of 72 bits allowing the connection of four DDR4T04G72 devices as shown above.

When the DDR4T04G72 is connected to Xilinx’s KU060 or Microchip’s PolarFire rad-tolerant FPGAs, the Table below summarises the resulting storage capacities and bandwidths assuming data rates of 1.33 and 1.86 GT/s, respectively. The maximum number of DDR4 IPs that can be instantiated within either FPGA depends on your specific I/O usage so confirm your configuration using the Vivado Design Suite or Libero SoC. NanoXplore’s NG-Ultra will also support DDR4 SDRAM.

System storage capacity and bandwidths.

Table System storage capacity and bandwidths (*confirm loading).

DDR4’s data signals DQ, DQS, and DM_n, have dynamic on-die termination (ODT) built into the FPGA controller and SDRAM, and ordinarily, external termination resistors would need to be placed at the far-end of the address, command, control, and clock nets as shown in Figure 7 and 8. However, these are not required for the DDR4T04G72, which contains ODT for all its high-frequency interface signals.

Conventional tree-topology routing creates a stub whose length increases with the number of receivers reducing the bandwidth of this transmission line. This attenuates the high-frequency components that form the rising and falling edges of the signal, shrinking the eye opening at the SDRAM. Fly-by routing reduces the number of stubs and their lengths.

Fly-by termination of DDR4 command, address and control signals.

Figure 7 Fly-by termination of DDR4 command, address and control signals.

Fly-by termination of DDR4 differential clock input.

Figure 8 Fly-by termination of DDR4 differential clock input.

DDR4 has on-die capacitance for the core as well as the I/O and therefore it is not necessary to allocate external capacitors for every power-pin pair. However, a minimum amount of PCB decoupling is specified for the DDR4T04G72 to prevent the supply from drooping when the SDRAM core requires current for refresh, read, and write operations. Decoupling also provides current during reads for the output drivers. The core requirements are lower frequency requiring larger capacitance values, whereas the drivers switching at higher rates necessitate low inductance and less capacitance.

You have completed your schematic design, layout, pre-fabrication timing, and signal-integrity checks, sub-contracted the assembly of the PCB, and verified that the new board powers up as expected. You are now ready to start using the memory. However, prior to operation, DDR4 has to be initialised so the SDRAM understands its operating frequency and delay parameters. DDR3 uses a voltage divider to create Vdd/2 as a reference to decide if the DQ signals are 0 or 1 as shown in Figure 3. DDR4 uses an internal voltage reference, VrefDQ, whose value must be set by the memory controller during the initialisation phase. Furthermore, SDRAM requires periodic calibration of the output driver impedance and ODT values to minimize variations in voltage and temperature—a process known as ZQ calibration.

The final step before DDR4 can be used is known as memory training, which calculates the read/write delays between the SDRAM and its controller. As shown in Figure 6, for multiple DDR4 chips connected to an FPGA, each device may be physically located at a different distance from the controller, resulting in individual flight-time skews between the clock, strobe, and the data. Write leveling compensates for these differences! In a fly-back topology, each chip receives the command, address, and control at a different time and read/write centering ensures data can be reliably read from or written to the SDRAM by always capturing in the middle of the data eye. Memory training initially calibrates the interfaces to ensure adequate margin prior to operation.

To conclude, DDR4 will allow the satellite industry to offer higher-throughput on-board processing and increased acquisition times enabling new Earth-observation, space science, and telecommunication applications—e.g., ultra high-resolution imagery, live streaming video, and on-board AI. As discussed, DDR4 contains new architectural and hardware features that need to be considered to ensure your design is right first time. In addition to the device datasheet, a user guide is also available for the DDR4T04G72. To help your time-to-market needs, a schematic symbol, PCB footprint, and a 3D model can be freely downloaded from the UltraLibrarian PCB CAD library to your desired EDA tool.

For the first time, DDR4 will allow satellite and spacecraft manufacturers to avail themselves of the large memory bandwidths that have been exploited by our commercial cousins for the last six years. Compared to existing, qualified DDR3 SDRAM, the DDR4T04G72 can be used with the latest space-grade FPGAs and microprocessors such as the rad-tolerant Qormino (see “Qormino: A compact, multicore processing system solution”) providing:

  • A 62% increase in memory bandwidth (0.172 Tb/s with a data rate of 2.4 GT/s), doubling current transfer speeds
  • A 25% increase in storage capacity
  • A 76% reduction in physical size
  • A 30% reduction in power consumption

Until next month, the first person to tell me the voltage swing for POD will win a Courses for Rocket Scientists World Tour tee-shirt. Congratulations to Gabriel from Argentina, the first to answer the riddle from my previous post.

This article was originally published on EDN.

Dr. Rajan Bedi is the CEO and founder of Spacechips, which designs and builds a range of advanced, L to K-band, ultra-high-throughput on-board processors, transponders, and edge-based OBCs for telecommunication, Earth observation, navigation, internet and M2M/IoT satellites. The company also offers space electronics design consultancy, avionics testing, technical marketing, business intelligence, and training services. Rajan can also be contacted on Twitter.

Spacechips’ design consultancy services develop bespoke satellite and spacecraft sub-systems, as well as advising customers how to use and select the right components, how to design, test, assemble and manufacture space electronics. We teach semiconductor memories on our FPGAs for Space Applications Training Course.

Related articles:


Leave a comment