The STAR-100 is a vector supercomputer designed, manufactured, and marketed by Control Data Corporation (CDC). It was one of the first machines to use a vector processor to improve performance on appropriate scientific applications.
The name STAR was a construct of the words STrings and ARrays. The 100 came from 100 million floating point operations per second (MFLOPS), the speed at which the machine was designed to operate. The computer was announced very early during the 1970s and was supposed to be several times faster than the CDC 7600, which was then the world's fastest supercomputer with a peak performance of 36 MFLOPS. On August 17, 1971, CDC announced that General Motors had placed the first commercial order for a STAR-100.
A number of basic design features of the machine meant that its real world performance was much lower than expected when first used commercially in 1974, and was one of the primary reasons CDC was pushed from its former dominance in the supercomputer market when the Cray-1 was announced in 1975. Only two STAR-100 systems were delivered.
In general organization, the STAR was similar to CDC's earlier supercomputers, where a simple CPU was supported by a number of peripheral processors that offloaded housekeeping tasks and allowed the CPU to crunch numbers as quickly as possible. In the STAR, both the CPU and peripheral processors were deliberately further simplified, to lower the cost and complexity of implementation. The STAR also differed from the earlier designs by being based on a 64-bit architecture instead of 60-bit, a side effect of the increasing use of 8-bit ASCII processing. Also unlike previous machines, the STAR made heavy use of microcode and also supported a virtual memory capability.
The main innovation in the STAR was the inclusion of instructions for vector processing. These new and more complex instructions approximated what was available to users of the APL programming language and operated on huge vectors that were stored in consecutive locations in the main memory. The CPU was designed to use these instructions to set up additional hardware that fed in data from the main memory as quickly as possible. For instance, a program could use single instruction with a few parameters to add all the elements in two vectors that could be as long as 65,535 elements. The CPU only had to decode a single instruction, set up the memory hardware, and start feeding the data into the math units. As with instruction pipelines in general, the time needed to complete any one instruction was no better than it was before, but since the CPU was working on a number of instructions at once (or in this case, data points) the overall performance dramatically improves due to the assembly line nature of the task.