*** Welcome to piglix ***

Cache coherence


In Computer Architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multiprocessing system.

In the illustration on the right, consider both the clients have a cached copy of a particular memory block from a previous read. Suppose the client on the bottom updates/changes that memory block, the client on the top could be left with an invalid cache of memory without any notification of the change. Cache coherence is intended to manage such conflicts by maintaining a coherent view of the data values in multiple caches.

In a shared memory multiprocessor system with a separate cache memory for each processor, it is possible to have many copies of shared data: one copy in the main memory and one in the local cache of each processor that requested it. When one of the copies of data is changed, the other copies must reflect that change. Cache coherence is the discipline which ensures that the changes in the values of shared operands(data) are propagated throughout the system in a timely fashion.

The following are the requirements for cache coherence:

Theoretically, coherence can be performed at the load/store granularity. However, in practice it is generally performed at the granularity of cache blocks.

Coherence defines the behavior of reads and writes to a single address location.

In a multiprocessor system, consider that more than one processor has cached a copy of the memory location X. The following conditions are necessary to achieve cache coherence:

The above conditions satisfy the Write Propagation criteria required for cache coherence. However, they are not sufficient as they do not satisfy the Transaction Serialization condition. To illustrate this better, consider the following example:

A multi-processor system consists of four processors - P1, P2, P3 and P4, all containing cached copies of a shared variable 'S' whose initial value is 0. Processor P1 changes the value of 'S' ( in its cached copy ) to 10 following which processor P2 changes the value of 'S' in its own cached copy to 20. If we ensure only write propagation, then P3 and P4 will certainly see the changes made to 'S' by P1 and P2. However, P3 may see the change made by P2 before seeing the change made by P1 and hence return 10 on a read to 'S'. P4 on the other hand may see changes made by P1 and P2 in the order in which they are made and hence return 20 on a read to 'S'. The processors P3 and P4 now have an incoherent view of the memory.


...
Wikipedia

...