Henri Kuiper (@henrikuiper) has an interesting, insightful observation illustrating one of the many reasons IBM z Systems are essential to secure, reliable enterprise computing.
On March 9, Google’s “Project Zero” published proof that a wide variety of computers are vulnerable to DRAM “rowhammer” attacks. As chip density increases, it’s becoming increasingly difficult to confine and control electrons within such tiny dimensions. DRAM (Dynamic Random Access Memory), an IBM invention, is an extremely important component in practically every computer, including in IBM z Systems. DRAM chips form the computer’s main memory, and if a computer’s memory does not function correctly, results are unpredictable or worse.
That’s exactly what Google’s security research team discovered, based on some earlier research by Yoongu Kim and others. Using a specially but easily crafted piece of code, they were able to influence the bit values of DRAM that the code shouldn’t have had access to, that should have been protected. The reason, in everyday language, is that electrons “hopped,” “leaked,” and/or “flipped” (take your pick). The individual DRAM chips are sufficiently unreliable now (and to some extent also in the past) that you cannot depend on their faithfully storing and retrieving binary zeroes and ones. You can “hammer” DRAM in certain ways that disturb other memory locations. In fact, as the Google team demonstrated, it’s possible to exploit this particular hardware vulnerability to gain complete control of an entire system without authorization. At present they don’t know how to protect fully against this vulnerability because the problem is a fundamental, undesirable characteristic of the hardware.
That’s alarming! It’s not totally surprising, at least to those who’ve been paying attention, but the publication of this proof of concept and test results are quite disturbing. The IT security community has a lot of work to do, and so will hardware manufacturers. Hardware manufacturers will likely need to implement at least what the IBM PC had way back in 1981: DRAM parity checking. (Though that’s likely not enough protection even in lower security contexts, especially as DRAM circuits continue to shrink.)
OK, so what about IBM z Systems? Now the “secret” is revealed. Many years ago IBM’s engineers predicted this class of problems, and they wanted to prevent IBM mainframes from even approaching these problems. Several years ago, in the previous decade, the engineers got to work redesigning and improving the memory subsystems in IBM mainframes in anticipation of these DRAM density-related problems they would face in the component supply chain even selecting the absolute best components. In 2010, IBM introduced the z196 mainframe with a brand new memory subsystem incorporating a breakthrough innovation: RAIM (Redundant Array of Independent Memory) design. RAIM is analogous to RAID (Redundant Array of Independent Disks) for magnetic and solid state disks. All data are cross-checked for both reads and writes, and the failure (or misbehavior) of any single component does not threaten data integrity. In fact, IBM’s RAIM design can tolerate up to triple component failures and still continue running, with no application interruption. (It’s also possible and common to configure an IBM z System such that, in the unlikely event there is a memory hardware failure requiring eventual service, even memory repairs also occur without interrupting applications.) You can “rowhammer” IBM z RAIM as much as you like, but you’re absolutely not going to flip bits that aren’t your bits in your authorized storage area.
The latest IBM z13 mainframe, now shipping, incorporates the third generation RAIM subsystem, and every mainframe IBM sells includes at least second generation RAIM. You cannot disable RAIM protections, and you cannot configure an IBM mainframe without RAIM. RAIM is standard, not optional. When you buy or lease an IBM mainframe, the amount of memory you acquire — to pick a random example, 480GiB — is customer usable. Physically there are many more of the highest quality DRAM components inside the machine to support RAIM. But when you order a machine with 480GiB usable, you get 480GiB usable, after RAIM overhead. You can now order an IBM z13 mainframe with as much as 10TiB of usable, RAIM-protected memory.
IBM z Systems are the only servers in the world featuring this extreme, innovative level of memory protection, and that’s been true for nearly half a decade and counting. IBM stands alone here. As we’ve now discovered (or been reminded), RAIM is not only critical to ensuring the continuous operation of your applications, it’s also critical to ensuring the utmost security.
OK, but what about ECC memory? IBM invented Chipkill advanced ECC memory, too, and all IBM servers feature at least Chipkill memory. It’s an important, essential technology, and it works well. (Google’s security researchers were not able to demonstrate “rowhammer” vulnerabilities in the ECC memory systems they tested. My prediction is that ECC will quickly emerge as the minimum requirement to secure DRAM from rowhammer and similar attacks, even in client devices.) But ECC isn’t RAIM. RAIM is a big step beyond ECC memory, providing greater assurance in maintaining application availability even as DRAM densities continue to increase. That makes sense, of course. One of the core principles of IBM z Systems design is to maximize the safety margins as much as possible. Mission critical means mission critical, quite simply. So if you need the utmost in memory reliability — and often you do — then quite simply you need an IBM z System and its unique RAIM design.