CDC 7600

The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s. The 7600 ran at 36.4 MHz (27.5 ns clock cycle) and had a 65 Kword primary memory using magnetic core and variable-size (up to 512 Kword) secondary memory (depending on site). It was generally about ten times as fast as the CDC 6600, and could deliver about 10 MFLOPS on hand-compiled code, with a peak of 36 MFLOPS. In addition, in benchmark tests in early 1970 it was shown to be slightly faster than its IBM rival, the IBM System/360, Model 195. When the system was released in 1969, it sold for around $5 million in base configurations, and considerably more as options and features were added.

After the 6600 started to near production quality, Cray lost interest in it and turned to designing its replacement. Making a machine "somewhat" faster would not be too difficult in the late 1960s; the introduction of integrated circuits allowed for denser packing of components, and in turn a higher clock speed. Transistors in general were also getting somewhat faster as the production processes and quality improved. These sorts of improvements might be expected to make a machine twice as fast, perhaps as much as five times. However, as with the 6600 design, Cray set himself the goal of producing a machine with ten times the performance.

One of the reasons the 6600 was so much faster than its contemporaries is that it had multiple functional units that could operate in parallel. For instance, the machine could perform an addition of two numbers while simultaneously multiplying two others. However, any given instruction had to complete its trip through the unit before the next could be fed it, which caused a bottleneck when the scheduler system ran out of instructions. Adding more functional units would not improve performance unless the scheduler was also greatly improved, especially in terms of allowing it to have more memory so it could look through more instructions for ones that could be fed to the units. That appeared to be a major problem.

In order to solve this problem, Cray turned to the concept of an instruction pipeline. Each functional unit consisted of several sections that operated in turn, for instance, an addition unit might have circuitry dedicated to retrieving the operands from memory, then the actual math unit, and finally another to send the results back to memory. At any given instance only one part of the unit was active while the rest waited their turn. A pipeline improves on this by feeding in the next instruction before the first has completed, using up that idle time. For instance, while one instruction is being added together, the next add instruction can be fed in so the operand fetch circuits can begin their work. That way as soon as the current instruction completes and moves to the output circuitry, the next addition is ready to be fed in. In this way each functional unit works in "parallel", as well as the machine as a whole. The improvement in performance generally depends on the number of steps the unit takes to complete, for instance, the 6600's multiply unit took 10 cycles to complete an instruction, so by pipelining the units it could be expected to gain about 10 times the speed.

...
Wikipedia