|
Scalability: Communication Induced Checkpointing
Communication induced checkpointing (CIC) allows processes in a distributed
computation to take independent checkpoints and to avoid the domino effect.
Our analysis of CIC protocols is based on a prototype implementation and
validated simulations. Our results indicate that there is sufficient
evidence to suspect that much of the conventional wisdom about these
protocols is questionable.
Representative Publication:
-
Lorenzo Alvisi, Elmootazbellah Elnozahy, Sriram S. Rao, Syed A. Husain,
and Asanka de Mel, An Analysis of Communication-Induced Checkpointing,
In Proceedings of IEEE International Conference
on Fault-Tolerant Computing (FTCS), pp. 242-249, June 1999
[
Abstract |
Paper ]
| |