Byzantine Fault-Tolerance for
Distributed Storage Systems

Data replication is a well-known means of protecting against data unavailability or corruption in the face of data server failures. When servers can suffer Byzantine (i.e., arbitrary) failures, the foremost approach for protecting data is via state machine replication, in which every correct server receives and processes every request in the same order, thereby producing the same output for each request. If the client then accepts a value returned by at least t+1 servers, then up to t arbitrary server failures can be masked. Numerous systems have been built to support this approach (e.g. Mike Reiter's Rampart).

To improve the efficiency and availability of data access while still protecting the integrity of replicated data, the use of quorum systems has been proposed. Quorum systems are a family of protocols that allow reads and updates of replicated data to be performed at only a subset (quorum) of the servers. In a t-masking quorum system, the quorums of servers are defined such that any two quorums intersect in at least 2t+1 servers. In a system with a maximum of t faulty servers, if each read and write operation is performed at a quorum, then the quorum used in a read operation will intersect the quorum used in the last preceding write operation in at least t+1 correct servers. With appropriate read and write protocols, this intersection condition ensures that the client is able to identify the correct, up-to-date data.

Unfortunately, however, there are obstacles to the use of masking quorum systems as an economical alternative for Byzantine fault tolerance. For one thing, they remain expensive in that tolerating multiple faults requires large quorums. Furthermore, the existing read and write protocols for these systems do not provide atomicity of operations in the general case.
The purpose of this project is to address these obstacles.

Current Focus

Past Highlights

Sponsors


Project Members

Collaborators

Alumni

Relevant Publications

  • Minimal Byzantine Storage (with J.P. Martin and M. Dahlin). In Proceedings of the 16th International Symposium on Distributed Computing (DISC 2002), Toulouse, France, October 2002, pp. 311-326.
  • Small Byzantine Quorum Systems (with J.P. Martin and M. Dahlin). In Proceedings of the International Conference on Dependable Systems and Networks (DSN 2002 and FTCS 32), DCC Symposium, Washington, DC, June 2002, pp. 374-383. See also the extended technical report.
  • A Framework for Semantic Reasoning about Byzantine Quorum Systems. (with E. Pierce) In Brief Announcements, Proceedings of Symposium on Principles of Distributed Computing, August 2001.
  • Dynamic Byzantyne Quorum Systems. (with D. Mahlki, L. Pierce, M. Reiter, and R. Wright) In Proceedings of the International Conference on Dependable Systems and Networks (FTCS 30 and DCCA 8), New York, NY, June 2000, pp. 283-292.
  • Probabilistic Techniques for Fault Detection in Byzantine Quorum Systems. (with D. Mahlki, L. Pierce, and M. Reiter). Proceedings of the Seventh IFIP International Working Conference on Dependable Computing for Critical Applications (DCCA-7) San Jose, CA, January 1999, pp. 379-394.