Since the dawn of civilization, the computing bottleneck has been the tawdry I/O relationship between disk drives and RAM (random access memory). The emergence of SSDs (solid state drives) loosened but didn't alter this I/O bottleneck. Three advances are conspiring to push the bottleneck right onto the CPU (central processing unit).

First, access rates and networking rates are cresting 25 Gb/s, 58 Gb/s, and on pace to hit 100 Gb/s in a decade near you. Computers can move data onto their disks from remote systems as fast as they can access their own disks. Second, the I/O barrier between disk and RAM is ready to fall, and when it does, the barrier between RAM and cache comes down with it.

Imagine having everything on your disk in RAM, but not merely in RAM, right there at L1 cache. You can get a tidy PC with 64 GBytes of RAM for under $2000, but imagine exabytes of cache. That’s alottabytes, it only takes 5 exabytes to hold all words ever spoken by humans. Consider the graph in Figure 1.

typical access times
Figure 1 Access time of HDD (hard disk drives), SSD, new superfast storage, RDMA over Converged Ethernet (RoCE), L2 cache, and L1 cache. Graphic courtesy of the Infiniband Trade Association.

Disk access times for hard and solid-state drives are on the left and RAM and cache access times are on the right. Micron's 3D Xpoint NVM (non-volatile memory) technology makes the trend obvious: new data storage technology is working its way toward RAM access times.

The green RoCE (pronounced "Rocky") column is the second piece of the puzzle: connecting many NVM "disks" on a network directly to RAM. (I put "disks" in quotes because a solid-state drive is no more like a spinning disk drive than the keypad on your phone is a "dial.") To be sure, RoCE isn't the only technology that accomplishes this, it's just the one I know about.

(Disclaimer: I wrote a white paper about RoCE for the Infiniband Trade Association. There are other technologies that claim to achieve the same miracle. They include Infiniband itself and iWARP.)

RoCE is a compound acronym: RDMA over converged Ethernet, where RDMA stands for remote direct memory access. DMA (direct memory access) has always been built into PCs. It allows internal peripheral components—disk drive controllers, sound, graphics, network cards, etc—to read from and write to system memory without bothering the processor. RDMA generalizes DMA to network adapters so that data can be transferred between applications on different servers without passing through the CPU or the main memory path of TCP/IP (transmission control protocol). That is, RDMA allows the network interface controller (NIC) to access RAM directly, bypassing the operating system and completely removing TCP/IP overhead.

The other key piece to the puzzle is new NVM technology like 3D XPoint, phase-change based solid state NVM developed through a collaboration between Intel and Micron, that will be 1,000 times faster than flash. The idea is to create random access technology in a three-dimensional design with perpendicular wires connecting submicroscopic columns that are 10 times more dense than conventional memory. The XPoint (crosspoint) die (Figure 2) has two layers and a crossbar design. Where NAND data is addressed by multi-kilobyte blocks, 3D XPoint NVM can be addressed byte by byte with latency of 7 µs or less. Since XPoint chips can be mounted on DIMMs, right on the memory bus, they can eliminate the distinction between "disk" and RAM.

XPoint chip design
Figure 2 The 3D Xpoint design uses stacked dice to increase density.

When 3D XPoint and other new persistent memory technologies (like 3D Super-NOR) merge RAM memory with what we think of as disk storage, everything changes. With RoCE running on the 400 Gbits/s networks that will emerge at the same time that 3D XPoint (and the like) fulfills its promise, the distinction between disk and RAM will evaporate.

Instead of the supply chain: remote disk to local disk to RAM to cache to data processing, we'll have "disk" to data processing. Not only will we render "disk" even more obsolete than "dial," but with the whole world's data effectively in RAM, data processing will be unbound by motherboards and physical location. The distinction between local and cloud computing will disappear, the speed of light will take its rightful place as arbiter of what gets done when, the CPU itself will be the processing bottleneck, and Moore's Law will continue its reign.

Ransom Stephens is a technologist, science writer, novelist, and Raiders fan, even when his team loses.

Related articles: