In the field of 3D computer graphics, the Unified Shader Model (known in Direct3D 10 as "Shader Model 4.0") refers to a form of shader hardware in a graphical processing unit (GPU) where all of the shader stages in the rendering pipeline (geometry, vertex, pixel, etc.) have the same capabilities. They can all read textures and buffers, and they use instruction sets that are almost identical.
Earlier GPUs generally included two types of shader hardware, with the vertex shaders having considerably more instructions than the simpler pixel shaders. This lowered the cost of implementation of the GPU as a whole, and allowed more shaders in total on a single unit. This was at the cost of making the system less flexible, and sometimes leaving one set of shaders idle if the workload used one more than the other. As improvements in fabrication continued, this distinction became less useful. ATI Technologies introduced a unified architecture on the hardware they developed for the Xbox 360, and then introduced this in card form in the TeraScale line. Nvidia quickly followed with their Tesla design. The concept has been universal since then.
Early shader abstractions (such as Shader Model 1.x) used very different instruction sets for vertex and pixel shaders, with vertex shaders having much more flexible instruction set. Later shader models (such as Shader Model 2.x and 3.0) reduced the differences, approaching Unified Shader Model. Even in the Unified model the instruction set may not be completely the same between different shader types; different shader stages may have a few distinctions. Fragment/pixel shaders can compute implicit texture coordinate gradients, while geometry shaders can emit rendering primitives.
Unified Shading Architecture is a hardware design by which all shader processing units of a piece of graphics hardware are capable of handling any type of shading tasks. Most often Unified Shading Architecture hardware is composed of an array of computing units and some form of dynamic scheduling/load balancing system that ensures that all of the computational units are kept working as often as possible.