*** Welcome to piglix ***

Compute kernel


In computing, a compute kernel is a routine compiled for high throughput accelerators (such as GPUs), DSPs or FPGAs, separate from (but used by) a main program. They are sometimes called compute shaders, sharing execution units with vertex shaders and pixel shaders on GPUs, but are not limited to execution on one class of device, or graphics APIs.

Compute kernels roughly correspond to inner loops when implementing algorithms in traditional languages (except there is no implied sequential operation), or to code passed to internal iterators.

They may be specified by a separate programming language such as "OpenCL C" (managed by the OpenCL API), as "compute shaders" (managed by a graphics API such as OpenGL), or embedded directly in application code written in a high level language, as in the case of C++AMP.

This programming paradigm maps well to vector processors: there is an assumption that each invocation of a kernel within a batch is independent, allowing for data parallel execution. However, atomic operations may sometimes be used for synchronisation between elements (for interdependent work), in some scenarios. Individual invocations are given indices (in 1 or more dimensions) from which arbitrary addressing of buffer data may be performed (including scatter gather operations), so long as the non-overlapping assumption is respected.


...
Wikipedia

...