|
Low-Overhead Protocol for Fault-Tolerant File Sharing
Conventional rollback-recovery protocols incur substantial overhead
when used for applications in which agents communicate through file
sharing. We have demonstrated that the resulting overhead is
significant and it is likely to increase as the scale of the
applications and the disparity between processor and disk speeds
continue to increase. We have developed a protocol that virtually
eliminates this overhead. The central idea of our solution is to
track causal dependencies resulting from file-sharing and to record
them using determinants---tuples that identify file I/O and message
passing operations and the order of their occurrence with respect to
other events in an agent execution. We show that if determinants are
available during recovery, then interactions with the file server
can be reproduced, and file data lost in a failure can be
regenerated. To ensure determinants' availability, we use an
efficient replication scheme that stores determinants in agents'
volatile memory. Using this protocol, we have introduced a novel
concept---implementation in volatile memory of stable storage for
files. This has led to a novel design of a fault-tolerant
server-less file system.
Representative Publication:
- L. Alvisi, S. Rao, and H.M. Vin, Low-overhead Protocols for
fault-tolerant File Sharing, In
Proceedings of the International Conference on Distributed Computing
Systems (ICDCS), Amsterdam, The Netherlands, pages 452--461, May 1998.
[
Abstract |
Paper ]
| |