Byzantine Fault Tolerance and Beyond |
This project seeks to improve system security and robustness by building distributed services that tolerate buggy, selfish, or malicious nodes. We use replication and Byzantine fault tolerant (BFT) protocols to mask incorrect behaviors.
Our main results to date have fallen into two broad categories.
Large fault-tolerant systems typically cross administrative
boundaries, especially in the case of peer-to-peer or cooperative applications.
In this case, the traditional Byzantine model is no longer appropriate and a
new model is needed to take into account the fact that the participants may
deviate from the protocol when acting in their self-interest. The problem is
not that Byzantine failures are not general enough, but rather that they are too
general: we know that no protocol can tolerate a majority of Byzantine nodes,
yet in a multi-administrative-domain deployment it is in fact possible that a
majority of the nodes act for their self-interest and therefore deviate from
the protocol. The question we answer is: how can we get these self-interested
nodes to cooperate?
We propose a principled approach to this problem. First, we introduce a
model, BAR, that specifies for both Byzantine and rational failures. We use the
word "rational" to mean the nodes that may deviate from the protocol
if it is in their self-interest. We believe that protocols that are designed
for deployment over multiple administrative domains should strive to be
BAR-tolerant.
Second, we show how BAR-tolerant protocols can be built. We provide BAR-tolerant
primitives, with which we show how to build a BAR-tolerant replicated state
machine. This replicated state machine then greatly simplifies the design of
BAR-tolerant protocols. As an illustration, we show a BAR-tolerant cooperative
backup application, BAR-B.
We hope that this paper will encourage more researchers to explore the BAR
model. Our community knows a lot about Byzantine fault-tolerant protocols, and
a lot of real systems are designed to withstand Byzantine failures. In
contrast, all we know of BAR-tolerant protocols is that there exists at least
one (the one we show in the paper). We hope that real systems of the future
will be designed to withstand not only a minority of Byzantine failures, but
also a majority of rationally-behaving nodes.
What is the minimum possible latency for consensus and the
replicated state machine? In this paper we show that the answer is 2 and 3
communication steps, respectively. While we are not the first to look at this
problem, we are the first (as far as we know) to provide an asynchronous
consensus protocol for unreliable links and Byzantine failures that can
complete in two communication steps (the protocol is safe in that model, but
naturally only live under nicer conditions).
We show that in order to complete in two communication steps despite
failures (a property we call "2-step"), there must be a minimum of 5f+1
processes to tolerate f Byzantine failures. The cost is fairly high,
so we also show a variant (Parameterized FaB) that can complete in two
communication steps with as few as 3f+1 processes (the optimal number
of processes). The catch then is that 2-step operation is only guaranteed if no
failure actually occurs. In order to tolerate f failures and complete
in two communication steps despite t actual failures, Parameterized
FaB requires 3f+2t+1 processors.
When you have a distributed system and a machine fails, you
naturally need to put a new one in its place. What if you want to add machines
to your total, or maybe use a whole new set of servers for your task?
Few systems allow you to change server membership, even though it is an
important concern in practice. If you are using the replicated state machine
approach then it is easy to change server membership. However, in some cases
the state machine approach is too heavyweight. Almost all lightweight Byzantine
fault-tolerant protocols use the quorums approach.
My advisor and I have determined that it is possible for a quorum-based
protocol to change its servers in a lightweight manner. Previous attempts
required additional servers to enable dynamism, but we have determined how to
do this without adding any server on top of the minimal number that is
necessary to provide Byzantine fault-tolerant storage in the first place. We
have shown how to do this for a storage system, but we expect that the same
technique can be used on any quorum protocol.
In this work, we revisit the traditional approach to replicated state
machines (replicated state machines are used to build fault-tolerant systems).
We show that the RSM should be split into two parts: agreement and execution.
The first part generates a unique sequence from the requests sent to the RSM,
and the second part executes the requests.
This separation allows for a better understanding of RSM, but they also have
the direct practical impact of showing how one can build a RSM using only 2f+1
execution replicas (to tolerate f Byzantine faults) instead of the 3f+1
required by previous approaches, thus saving cost.
The separation also makes it easier to get more out of replicated state
machines. The privacy firewall described below follows directly from the
separation of agreement and execution.
Traditionally, redundancy is used to improve either the performance or the
availability of services. When the servers contain confidential information,
the additional servers introduced by redundancy make it more likely that an
attacker would be able to find a server with a vulnerability he knows how to
exploit.
We are exploring how to use redundancy in a novel way to build "privacy
firewalls" that protect the confidentiality of replicated data. In
particular, we are experimenting with an intrusion-tolerant nfs that enforces
the access rights to files even if some machines have been compromised.
Hortum is a new and efficient distributed storage protocol that tolerates
Byzantine failures. In other words, Hortum makes it possible to store data
"on the network" in such a way that it can be recovered
(automatically) even if several servers on the network are not reachable or
have been hacked into (i.e. are actively trying to defeat the storage system).
In the course of this research we have submitted two papers to conferences.
The corresponding publications and technical reports are available:
We argue for a simple change to Byzantine Fault Tolerant state machine replication libraries in order to provide high throughput. Traditional state machine replication based Byzantine fault tolerant (BFT) techniques provide high availability and security but fail to provide high throughput. This limitation stems from the fundamental assumption of generalized state machine replication techniques that all replicas execute requests sequentially in the same total order to ensure consistency across replicas. We propose a high throughput Byzantine fault tolerant architecture that uses application-specific information to identify and concurrently execute independent requests. Our architecture thus provides a general way to exploit application parallelism in order to provide high throughput without compromising correctness. Although this approach is extremely simple, it yields dramatic practical benefits. When sufficient application concurrency and hardware resources exist, CBASE, our system prototype, provides orders of magnitude improvements in throughput over BASE, a traditional BFT architecture. CBASE-FS, a Byzantine fault tolerant file system that uses CBASE, achieves twice the throughput of BASE-FS for the IOZone micro-benchmarks even in a configuration with modest available hardware parallelism.
Mike Dahlin's other research projects |