The FPS AP-120B was a 38-bit, pipeline-oriented array processor manufactured by Floating Point Systems. It was designed to be attached to a host computer such as a DEC PDP-11 as a fast number-cruncher. Data transfer was accomplished using direct memory access.
Processor cycle time was 167 nanoseconds, giving a speed of 6 MHz. Since it could present two floating point results per cycle, one from the adder and the other from the multiplier, a capacity of 12 Megaflops was claimed for the processor.
The processor was designed around the concept of multiple parallel processing units operating in synchronization. A single 64-bit instruction word was divided into fields, each of which instructed a particular module under the control of the CPU. The modules were as follows:
The processor had access to dual-interleaved core memory in which odd numbered addresses were stored in one physical bank, and even numbered addresses were stored in the other. This represented an attempt to take advantage of typical sequential fetching of memory words. Fetching sequentially from one physical bank would result in a latency of two instruction cycles before the data was loaded into the destination data pad. Interleaving allowed a sequential access to occur immediately after the previous one. Both accesses took two cycles to complete, but the overlap and dual destination pads maximized the use of the data channel.
The floating point arithmetic modules were both multi-stage processors which were driven by explicit instructions. In the two-stage adder an assembler instruction such as FADD DX,DY would load values from data pads DX and DY into stage one of the adder. A subsequent FADD instruction would be required to present the result at the adder's output. This second FADD could be a dummy with no arguments, or it could be the next calculation in a sequence. In this fashion a stream of FADD operations could be performed in a pipeline, with a new result in every instruction cycle though every addition requires two cycles.
Similarly the multiplier, a three-stage unit, required one FMUL DX,DY to begin a multiplication, followed by two more FMUL instructions to produce the result. Careful programming of the pipeline allowed the production of one result per cycle, with each calculation taking three cycles in itself.
For maximum efficiency all calculations were programmed using the assembler language supplied with the hardware. A high-level language resembling Fortran was provided for coordinating tasks and controlling data transfers to and from the host computer.