Director: Simon S. Lam (more publications)
Silk: A
Resilient Routing Fabric for Peer-to-Peer Networks
Simon S. Lam and Huaiyu Liu
Technical Report TR-03-13, May 16, 2003 (revised, October 12,
2003).
Several proposed peer-to-peer networks use hypercube routing for scalability. In
a previous paper, we showed that consistency of neighbor tables in hypercube
routing guarantees the existence of a path from any source node to any
destination node. Consistency, however, can be broken by the failure of one
node. To improve the robustness of hypercube routing, we generalize the concept
of consistency to K-consistency, for K ≥ 1.
We then show that a K-consistent hypercube routing network provides at
least K disjoint paths from any source node to any destination node with
a probability close to 1. The first objective of this report is the design and
specification of a new join protocol together with a proof that it generates
K-consistent neighbor tables for an arbitrary number of concurrent joins
(under the assumption that there is no concurrent leave or failure). To do so,
we construct a more general definition of C-set tree than our previous
one as the conceptual foundation
for protocol design and reasoning about K-consistency. Both the new
protocol and proof require major extensions to the ones in our previous paper to
generalize them from 1-consistency to K-consistency.
The second objective of this report is the design and evaluation of a failure
recovery protocol for K-consistent networks. From simulation experiments
in which up to 50% of the nodes in a K-consistent network failed (when a
node fails, it becomes silent), we found that, for K ≥ 2, K-consistency
was recovered in every experiment. The third objective of this report is to
extend our join and failure recovery protocols such that they
construct and maintain K-consistent neighbor tables for networks whose
nodes join and fail concurrently and frequently. In particular, our join
protocol is extended with rules to handle failures of not only existing nodes
but also other joining nodes. These extended protocols, being implemented in our
prototype system named Silk, will be referred to as Silk protocols. From
simulation experiments in which the number of concurrent joins and failures was
up to 50% of the initial network size, we found that, for K ≥ 2, Silk
generated and maintained K-consistent neighbor tables after the
concurrent joins and failures in every experiment. We also present an analysis
of the communication and storage overheads of Silk protocols and show that Silk
is scalable to a large number of network nodes