Large applications, such as simulators and database servers, require cost-effective computation power beyond that of a single microprocessor. Shared-memory multiprocessor servers have emerged as a popular solution, because the system appears like a multi-tasking uniprocessor to many applications. Most shared memory multiprocessors use per-processor cache hierarchies that are kept transparent with a coherence algorithm.
The two classic classes of coherence algorithms are snooping and directories. Snooping keeps caches coherent using a totally ordered network to broadcast coherence transactions directly to all processors and memory. In contrast, directory protocols transmit a coherence transaction over an arbitrary point-to-point network to a directory entry (usually at memory), which, in turn, re-directs the transaction to a superset of processors caching the block.
This talk discusses a new coherence method called MULTICAST SNOOPING that dynamically adapts between broadcast snooping and a directory protocol. Multicast snooping is unique because processors predict which caches should snoop each coherence transaction by specifying a multicast "mask." Transactions are delivered with an ordered multicast network, such as an Isotach network. Processors handle transactions as they would with a snooping protocol, while a simplified directory operates in parallel to check masks and gracefully handle incorrect ones (e.g., previous owner missing).
Preliminary performance numbers provide encouragement that multicast snooping can obtain data directly (like broadcast snooping) but apply to larger systems (like directories). For SPLASH-2 benchmarks running on 32 processors and we can limit multicasts to an average of 2-6 destinations (<< 32) and deliver 2-5 multicasts per network cycle (>> broadcast snooping's 1 per cycle). A paper based on this work appears in the 1999 International Symposium on Computer Architecture.
Back to LESS
Last modified: February 11, 1999