Cache hierarchy

Cache hierarchy or Multi-level caches refers to a memory model designed generally to hold data which is more likely to be requested by processors. The purpose of such memory models is to provide a faster execution of memory related instructions, and faster overall performance of the system.

This model was for CPU cores to run at faster clocks, needing to hide the memory latency of the main memory access. Today Multi-level caches are the optimum solution to provide such a fast access to data residing in main memory. The access time to memory that acts as a bottleneck for the CPU core performance can be relaxed by using a hierarchical cache structure in order to reduce the latency and hence speed up the CPU clock.

In the history of computer and electronic chip developments, there was a time that CPUs were getting faster and faster while memory access speeds had not much of such improvements. At the time, this gap and difference between CPUs and memories became a trigger point of need for enhancements in memory access time. With getting CPUs faster, systems were capable of running and executing more instructions in a given time rather than before, but the time limitation in data access from memory prevented programmers to benefit this capability. This issue was the motivation behind thoughts for achieving memory models with higher access rate in order to company with processors for a better and faster performance. Therefore, the needs for such memory models resulted to the concept of Cache memory. This concept was first proposed by Maurice Wilkes a British computer scientist in University of Cambridge in 1965, but at the time he called such memories as "slave memory". Roughly between 1970-1990 there were lots of papers and articles proposed by many people like Anant Agarwal, Alan Jay Smith, Mark D. Hill, Thomas R. Puzak, etc., regarding enhancement and analysis for a better cache memory designs. First cache memory models were implemented at that time, but as researchers were investigating and proposing better designs, the need for faster memory models still could have been sensed. Because although those cache models improved data access latency, they could not have enough storage capacity to cover much data compared to main memory and there was lots of data to be accessed in the old fashioned way with high latency. Therefore, approximately from 1990 and so on, gradually ideas like adding another cache level (second-level) to such memory models as a backup for the first level cache came into thoughts and proposals. Many people like Jean-Loup Baer, Wen-Hann Wang, Andrew W. Wilson, etc. have conducted researches on this model. When several simulations and implementations demonstrated the advantages of such two-level cache models in having a faster data access from memory, concept of multi-level caches formed as a new and generally better model of cache memories compared to its previous single form. From year 2000 until now multi-level cache models received a widespread attention and it is implemented wildly in many systems as we can find three level-caches in Intel Core i7 products.

...
Wikipedia