Cache Review
Outline:
- Describe and explain the purpose of caches
- Briefly explain cache miss and cache hit
- Explain cache miss and the time penalty. What are the two components that cache miss penalty is dependent on?
- Explain the effects of cache on performance (performance equation)
- Explain and describe cache organization
- Identify and describe the different cache optimization techniques
- Describe coherence protocols
Cache Functionality and Purpose
Cache memory is a small but fast type of system memory that is situated closely to the processor and inside the die. Caches are a solution to optimizing execution time by reducing the time it takes to fulfill a memory request.
Cache Miss Types - 3 C’s
There are three types of cache misses: Compulsory, conflict, and capacity.
Compulsory Miss
A compulsory miss is a miss that occurs when trying to access data for the first time. It can be thought of as being part of the startup time for some algorithm or process.
Conflict Miss
A conflict miss occurs when a block is needed which existed in the cache before, but was evicted in favor of another block that had to be mapped to the same slot.
Capacity Miss
A capacity miss occurs when the cache has no capacity left. [^pandoc] [^pandoc]: {-} [Note] The difference between conflict and compulsory is that a conflict miss may occur when the cache still may otherwise have capacity to retain data elements. It is the folly of the selected method of organization, not limited capacity.
Cache Misses
The time required to service a cache miss is dependent upon the bandwidth and latency of the system.
- Latency is the total time it takes to bring the data from main memory. This time is dominated primarily by the distance RAM is from cache and by the sheer amount of time RAM takes to retrieve data from its banks.
- Bandwidth refers to the amount of work done per unit time. Bandwidth is dictated by bus size.
As a result of misses in cache, execution of instructions must be stalled until the data is made available.
Average Memory Access Time (AMAT) Equation
Let h
be the hit rate, m
the miss rate, t_m
be the time it takes to service a miss, and t_h
be the time it takes to service a hit.
Intuitively, the average cost for moving data into the registers is the average number of times we have a hit multiplied by the time it takes to retrieve it from cache, plus the average number of misses multiplied by the time it takes to service the miss, plus the time it took to check if it was a hit.
Formulaically, this is:
AMAT = h * t_h + m * (t_m + t_h)
But we know that h = 1 - m
. So:
AMAT = (1 - m) * t_h + m * (t_m + t_h)
AMAT = t_h - m * t_h + m * t_h + m * t_m
AMAT = t_h - m * t_m
Cache Organization
Where a Block is Found
When indexing into the cache for a single byte, the address identifies where in the cache to look for it.
- Cache is organized into fixed-sized byte segments called a cache line or block.
- To retrieve a byte from cache, we index into a cache line using the block offset.
- Block offset is identified by the lower bits of the address.
- Each cache line is identified via a tag which is the upper bits of the address.
- Cache lines are arranged into sets. These sets correspond to the middle bits of the address.
The number of bits used to identify which set leaves us with three different possible cases for cache organization: N-way set associative, direct mapped, and fully associative.
Direct Mapped
In direct mapped caches, the number of bits used for the tag are zero, and the number of bits for the set is the upper and middle. Pros:
- Causes cache lines to be directly mapped into the same position in cache. Cons:
- Limited associativity exhibited by direct mapped caches can lead to cache conflict misses.
Fully Associative
In fully associative caches, the set bits are none and the upper and middle bits are used only for the tag. The result of this is that cache lines can be placed anywhere in the cache. Cons:
- Increased execution time since a linear search for the tag corresponding to the desired byte must be carried out.
N-Way Set Associative
N-way set associative caches use a combination of direct mapping and fully associative mapping. The middle bits of the address determine the set, and the upper bits determine the tag. Each set can hold up to N cache lines, allowing for more flexibility in placement compared to direct mapped caches.
How a block is found:
To find a block in cache, we first index into the correct set, and search for a tag mach.