Egida
Home
Members
Projects >
Publications
Software
Lab Services
Lab Setup
Sponsors
Call for Papers
Site Outline
Egida >
Egida
Publications
Low-Overhead Protocol for Fault-Tolerant File Sharing

Conventional rollback-recovery protocols incur substantial overhead when used for applications in which agents communicate through file sharing. We have demonstrated that the resulting overhead is significant and it is likely to increase as the scale of the applications and the disparity between processor and disk speeds continue to increase. We have developed a protocol that virtually eliminates this overhead. The central idea of our solution is to track causal dependencies resulting from file-sharing and to record them using determinants---tuples that identify file I/O and message passing operations and the order of their occurrence with respect to other events in an agent execution. We show that if determinants are available during recovery, then interactions with the file server can be reproduced, and file data lost in a failure can be regenerated. To ensure determinants' availability, we use an efficient replication scheme that stores determinants in agents' volatile memory. Using this protocol, we have introduced a novel concept---implementation in volatile memory of stable storage for files. This has led to a novel design of a fault-tolerant server-less file system.

Representative Publication:

  1. L. Alvisi, S. Rao, and H.M. Vin, Low-overhead Protocols for fault-tolerant File Sharing, In Proceedings of the International Conference on Distributed Computing Systems (ICDCS), Amsterdam, The Netherlands, pages 452--461, May 1998. [ Abstract | Paper ]

   Egida Key Results Quick Navigation