Data parallelism is a form of parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like arrays and matrices by working on each element in parallel. It contrasts to task parallelism as another form of parallelism.
A data parallel job on an array of 'n' elements can be divided equally among all the processors. Let us assume we want to sum all the elements of the given array and the time for a single addition operation is Ta time units. In the case of sequential execution, the time taken by the process will be n*Ta time units as it sums up all the elements of an array. On the other hand, if we execute this job as a data parallel job on 4 processors the time taken would reduce to (n/4)*Ta + Merging overhead time units. Parallel execution results in a speedup of 4 over sequential execution. One important thing to note is that the locality of data references plays an important part in evaluating the performance of a data parallel programming model. Locality of data depends on the memory accesses performed by the program as well as the size of the cache.
Exploitation of the concept of Data Parallelism started in 1960s with the development of Solomon machine. Solomon machine, also called a vector processor wanted to expedite the math performance by working on a large data array(operating on multiple data in consecutive time steps). Concurrency of data was also exploited by operating on multiple data points at the same time using a single instruction. These generation of processors were called Array Processors. Today, data parallelism is best exemplified in graphics processing units(GPUs) which use both the techniques of operating on multiple data points in space and time using a single instruction.
In a multiprocessor system executing a single set of instructions (SIMD), data parallelism is achieved when each processor performs the same task on different pieces of distributed data. In some situations, a single execution thread controls operations on all pieces of data. In others, different threads control the operation, but they execute the same code.
For instance, consider matrix multiplication and addition in a sequential manner as discussed in the example.