Very long instruction word

Very long instruction word (VLIW) refers to processor architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute at the same time, concurrently, in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

The traditional means to improve performance in processor architectures include dividing instructions into substeps so the instructions can be executed partly at the same time (termed pipelining), dispatching individual instructions to be executed independently, in different parts of the processor (superscalar architectures), and even executing instructions in an order different from the program (out-of-order execution). These methods all complicate hardware (larger circuits, higher cost and energy use) because the processor must make all of the decisions internally for these methods to work. In contrast, the VLIW method depends on the programs providing all the decisions regarding which instructions to execute simultaneously and how to resolve conflicts. As a practical matter, this means that the compiler (software used to create the final programs) becomes far more complex, but the hardware is simpler than in many other means of parallelism.

The acronym VLIW can also refer to variable-length instruction word, a criterion in instruction set design to allow a more flexible layout of the instruction set and higher code density (depending on the instructions to be used). For example, this method makes it possible to load an immediate value of the size of a machine word into a processor register, which is not feasible if each instruction is limited to the size of a machine word. This flexibility raises instruction decoding needs.

A processor that executes every instruction one after the other (i.e., a non-pipelined scalar architecture) may use processor resources inefficiently, yielding potential poor performance. The performance can be improved by executing different substeps of sequential instructions simultaneously (termed pipelining), or even executing multiple instructions entirely simultaneously as in superscalar architectures. Further improvement can be achieved by executing instructions in an order different from that in which they occur in a program, termed out-of-order execution.

...
Wikipedia