TRIPS is a microprocessor architecture being designed by a team at the University of Texas at Austin in conjunction with IBM, Intel, and Sun Microsystems. TRIPS uses an instruction set architecture that is designed to be easily broken down into large groups of instructions (graphs) that can be run on independent processing elements. The design collects related data into the graphs, attempting to avoid expensive data reads and writes and keeping the data in high speed memory close to the processing elements. The prototype TRIPS processor contains 16 such elements, but it is expected this will rapidly scale up to 128 in "real world" processors in the near future. Combined with a number of architecture changes, the TRIPS design hopes to reach 1 TFLOP on a single processor by 2012.
Traditional processors examine and run instructions from a list of instructions stored in memory, known as a program. As memory runs much more slowly than the CPU, in order to maximize performance, modern CPU designs include a number of high-speed memory elements to temporarily store values and avoid the delays inherent to talking to main memory. One of the key advances in the RISC concept was to reduce the complexity of the instructions, and use those savings in the associated circuitry to include more processor registers, the fastest form of memory. This technique quickly reached its limits, and since the 1990s modern CPUs have added increasing amounts of CPU cache to increase local storage, although cache is slower than registers.
Since the late 1990s, the performance gains have mostly been made through the use of additional "functional units", which allow some instructions to run in parallel. For instance, two addition instructions working on different data can be run at the same time, effectively doubling the speed of the program. Modern CPUs generally have dozens of such units, some for integer math and logic, some for floating point math, some for long-data words and others for dealing with memory and other housekeeping chores. However, most programs do not work on independent data, but instead use the outputs of one calculation as the input to another. This limits the set of instructions that can be run in parallel to some factor based on how many instructions the processor is able to examine on-the-fly. The level of instruction parallelism quickly plateaued by the mid-2000s.