Chipkill is IBM's trademark for a form of advanced error checking and correcting (ECC) computer memory technology that protects computer memory systems from any single memory chip failure as well as multi-bit errors from any portion of a single memory chip. One simple scheme to perform this function scatters the bits of a Hamming code ECC word across multiple memory chips, such that the failure of any single memory chip will affect only one ECC bit per word. This allows memory contents to be reconstructed despite the complete failure of one chip. Typical implementations use more advanced codes, such as a BCH code, that can correct multiple bits with less overhead.
Chipkill is frequently combined with dynamic bit-steering, so that if a chip fails (or has exceeded a threshold of bit errors), another, spare, memory chip is used to replace the failed chip. The concept is similar to that of RAID, which protects against disk failure, except that now the concept is applied to individual memory chips. The technology was developed by the IBM Corporation in the early and middle 1990s. An important RAS feature, Chipkill technology is deployed primarily on SSDs, mainframes and midrange servers.
An equivalent system from Sun Microsystems is called Extended ECC, while equivalent systems from HP are called Advanced ECC and Chipspare. A similar system from Intel, called Lockstep memory, provides double-device data correction (DDDC) functionality. Similar systems from Micron, called redundant array of independent NAND (RAIN), and from SandForce, called RAISE level 2, protect data stored on SSDs from any single NAND flash chip going bad.