Byzantine Fault Tolerance and Beyond

This project seeks to improve system security and robustness by building distributed services that tolerate buggy, selfish, or malicious nodes. We use replication and Byzantine fault tolerant (BFT) protocols to mask incorrect behaviors.

Our main results to date have fallen into two broad categories.

Reducing the cost/increasing the performance of BFT systems

Fast Byzantine Consensus
Dynamic Quorums
Separating Agreement from Execution
Minimal Byzantine Storage
High Throughput Byzantine State Machine Replication

Broadening the applicability of BFT protocols

BAR (Byzantine Altruistic Rational) Fault Tolerance
Privacy Firewall

People

Faculty

Students

Alumni

Jean-Philippe Martin (Microsoft Research, Cambridge)
Arun Venkataramani (University of Massachusettes, Amherst)
Jian Yin (IBM T.J. Watson Research)

Technical achievements

BAR (Byzantine Altruistic Rational) Fault Tolerance

Large fault-tolerant systems typically cross administrative boundaries, especially in the case of peer-to-peer or cooperative applications. In this case, the traditional Byzantine model is no longer appropriate and a new model is needed to take into account the fact that the participants may deviate from the protocol when acting in their self-interest. The problem is not that Byzantine failures are not general enough, but rather that they are too general: we know that no protocol can tolerate a majority of Byzantine nodes, yet in a multi-administrative-domain deployment it is in fact possible that a majority of the nodes act for their self-interest and therefore deviate from the protocol. The question we answer is: how can we get these self-interested nodes to cooperate?

We propose a principled approach to this problem. First, we introduce a model, BAR, that specifies for both Byzantine and rational failures. We use the word "rational" to mean the nodes that may deviate from the protocol if it is in their self-interest. We believe that protocols that are designed for deployment over multiple administrative domains should strive to be BAR-tolerant.

Second, we show how BAR-tolerant protocols can be built. We provide BAR-tolerant primitives, with which we show how to build a BAR-tolerant replicated state machine. This replicated state machine then greatly simplifies the design of BAR-tolerant protocols. As an illustration, we show a BAR-tolerant cooperative backup application, BAR-B.

We hope that this paper will encourage more researchers to explore the BAR model. Our community knows a lot about Byzantine fault-tolerant protocols, and a lot of real systems are designed to withstand Byzantine failures. In contrast, all we know of BAR-tolerant protocols is that there exists at least one (the one we show in the paper). We hope that real systems of the future will be designed to withstand not only a minority of Byzantine failures, but also a majority of rationally-behaving nodes.

BAR Fault Tolerance for Cooperative Services. A. Aiyer, L. Alvisi, A. Clement, M. Dahlin, J-P. Martin, and C. Porth. 20th Symposium on Operating Systems Principles (SOSP 2005), October 2005. pdf bibtex
BAR Gossip, H. Li, A. Clement, E. Wong, J. Napper, I. Roy, L. Alvisi, M. Dahlin, In review, Apr 2006. pdf bibtex.
"Winning BAR Games," A. Clement, J. Napper, L. Alvisi, M. Dahlin, In review., May 2006 .

Fast Byzantine Consensus

What is the minimum possible latency for consensus and the replicated state machine? In this paper we show that the answer is 2 and 3 communication steps, respectively. While we are not the first to look at this problem, we are the first (as far as we know) to provide an asynchronous consensus protocol for unreliable links and Byzantine failures that can complete in two communication steps (the protocol is safe in that model, but naturally only live under nicer conditions).

We show that in order to complete in two communication steps despite failures (a property we call "2-step"), there must be a minimum of 5f+1 processes to tolerate f Byzantine failures. The cost is fairly high, so we also show a variant (Parameterized FaB) that can complete in two communication steps with as few as 3f+1 processes (the optimal number of processes). The catch then is that 2-step operation is only guaranteed if no failure actually occurs. In order to tolerate f failures and complete in two communication steps despite t actual failures, Parameterized FaB requires 3f+2t+1 processors.

Fast Byzantine Consensus. J-P. Martin and L. Alvisi. 2005 International Conference on Dependable Systems and Networks (DSN 2005), June 2005. pdf bibtex

Dynamic Quorums

When you have a distributed system and a machine fails, you naturally need to put a new one in its place. What if you want to add machines to your total, or maybe use a whole new set of servers for your task?

Few systems allow you to change server membership, even though it is an important concern in practice. If you are using the replicated state machine approach then it is easy to change server membership. However, in some cases the state machine approach is too heavyweight. Almost all lightweight Byzantine fault-tolerant protocols use the quorums approach.

My advisor and I have determined that it is possible for a quorum-based protocol to change its servers in a lightweight manner. Previous attempts required additional servers to enable dynamism, but we have determined how to do this without adding any server on top of the minimal number that is necessary to provide Byzantine fault-tolerant storage in the first place. We have shown how to do this for a storage system, but we expect that the same technique can be used on any quorum protocol.

A Framework for Dynamic Byzantine Storage. J-P. Martin and L. Alvisi. 2004 International Conference on Dependable Systems and Networks (DSN 2004), June 2004. ps pdf bibtex

Separating Agreement from Execution

In this work, we revisit the traditional approach to replicated state machines (replicated state machines are used to build fault-tolerant systems). We show that the RSM should be split into two parts: agreement and execution. The first part generates a unique sequence from the requests sent to the RSM, and the second part executes the requests.

This separation allows for a better understanding of RSM, but they also have the direct practical impact of showing how one can build a RSM using only 2f+1 execution replicas (to tolerate f Byzantine faults) instead of the 3f+1 required by previous approaches, thus saving cost.

The separation also makes it easier to get more out of replicated state machines. The privacy firewall described below follows directly from the separation of agreement and execution.

Separating Agreement from Execution for Byzantine Fault Tolerant Services. Jian Yin, Jean-Philippe Martin, Arun Venkataramani, Lorenzo Alvisi, Mike Dahlin. Proceedings of the 19th Symposium on Operating Systems Principles (SOSP 2003), October 2003. ps pdf bibtex SOSP slides code

Privacy Firewall

Traditionally, redundancy is used to improve either the performance or the availability of services. When the servers contain confidential information, the additional servers introduced by redundancy make it more likely that an attacker would be able to find a server with a vulnerability he knows how to exploit.

We are exploring how to use redundancy in a novel way to build "privacy firewalls" that protect the confidentiality of replicated data. In particular, we are experimenting with an intrusion-tolerant nfs that enforces the access rights to files even if some machines have been compromised.

Byzantine Fault-Tolerant Confidentiality (updated) J. Yin, J-P. Martin, A. Venkataramani, L. Alvisi, M. Dahlin. Future Directions in Distributed Computing Workshop, June 2002. ps pdf bibtex slides

Minimal Byzantine Storage

Hortum is a new and efficient distributed storage protocol that tolerates Byzantine failures. In other words, Hortum makes it possible to store data "on the network" in such a way that it can be recovered (automatically) even if several servers on the network are not reachable or have been hacked into (i.e. are actively trying to defeat the storage system).

In the course of this research we have submitted two papers to conferences. The corresponding publications and technical reports are available:

Minimal Byzantine Storage J-P. Martin, L. Alvisi, M. Dahlin. Proceedings of the 16th International Symposium on Distributed Computing (DISC 2002). Toulouse, France. LNCS 2508 pp 311-325. ISBN 3-540-00073-9. ps pdf bibtex slides. See also the extended technical report.
Small Byzantine Quorum Systems J.-P. Martin, L. Alvisi, M. Dahlin. Proceedings of the 2002 International Conference on Dependable Systems & Networks (DSN 2002). ps pdf bibtex. See also the extended technical report (ps.Z ps).

High-throughput Byzantine State Machine Replication

We argue for a simple change to Byzantine Fault Tolerant state machine replication libraries in order to provide high throughput. Traditional state machine replication based Byzantine fault tolerant (BFT) techniques provide high availability and security but fail to provide high throughput. This limitation stems from the fundamental assumption of generalized state machine replication techniques that all replicas execute requests sequentially in the same total order to ensure consistency across replicas. We propose a high throughput Byzantine fault tolerant architecture that uses application-specific information to identify and concurrently execute independent requests. Our architecture thus provides a general way to exploit application parallelism in order to provide high throughput without compromising correctness. Although this approach is extremely simple, it yields dramatic practical benefits. When sufficient application concurrency and hardware resources exist, CBASE, our system prototype, provides orders of magnitude improvements in throughput over BASE, a traditional BFT architecture. CBASE-FS, a Byzantine fault tolerant file system that uses CBASE, achieves twice the throughput of BASE-FS for the IOZone micro-benchmarks even in a configuration with modest available hardware parallelism.

High Throughput Byzantine Fault Tolerance

pdf

Talk(ppt)

Mike Dahlin's other research projects