The current euphoria over the World Wide Web does not do full justice
to the potential of the Internet; with the many-fold increase in CPU
processing power and network bandwidth, it is inevitable that the
Internet will support distributed applications of great
complexity. While information retrieval applications dominate the
Internet today, we can anticipate applications that process massive
amounts of data for visualization and support real-time interactivity
in a future generation of the Internet. For instance, a digital
library of satellite imagery might be processed by programs for
feature extraction or visualization; a virtual environment for
training fire fighters will involve distributed simulations and user
interactivity in real-time. All of these applications involve storing,
transporting, and processing multiple types of information -- textual
and numeric data, images, audio, video, animation sequences,
etc. -- which we collectively refer to as multimedia. Due to the
inherent differences in the data type characteristics (e.g., size,
format, tolerance to errors, real-time requirements), these
applications impose widely varying requirements on the underlying
network and operating systems.
Our primary research objective is to conduct the basic and
experimental research necessary to address the problems that lie at
the core of these emerging applications. Over the past few years, we
have investigated techniques for designing: (1) a multimedia file
system for storage and retrieval of multi-resolution multimedia
objects, (2) network algorithms and protocols for transmission of
multimedia objects over integrated services networks, and (3)
operating system mechanisms such as processor scheduling as well as
transport and higher layer protocols for efficient processing and
streaming of data at the end-station.
Integrated File Systems
Techniques for efficiently managing
storage and retrieval of heterogeneous objects depend upon their
characteristics, access pattern, and performance
requirements. Consequently, a file system for heterogeneous data
objects should export multiple service classes (e.g., best-effort,
real-time), as well as enable the coexistence of multiple data type
specific policies for placement, retrieval, fault tolerance, and
caching. We have developed Symphony, a multimedia file system
that meets these requirements. Symphony implements novel techniques
for audio, video, imagery, and scalable dynamic arrays of scientific
data, along with conventional techniques for textual data. It enables
the coexistence of these diverse techniques by carefully separating
data type independent mechanisms from data type specific policies, and
instantiating them in a layered architecture.
Network Algorithms and Protocols for Integrated Services Networks
To meet the wide range of application requirements (defined
in terms of end-to-end delay, bandwidth, packet loss, etc.),
integrated services networks employ packet scheduling algorithms. A
suitable packet scheduling algorithm for integrated services network
should meet application requirements and support hierarchical link
bandwidth allocation to enable the coexistence of multiple services
and protocols in a network. We have designed a class of
algorithms---Start-time Fair Queuing (SFQ) and Fair Airport
(FA)---that meet these requirements. We have analyzed single server
delay and fairness properties of these algorithms. A network of
servers is more difficult to analyze since each node along the path
may employ a different scheduling algorithm, packets may be fragmented
and reassembled, etc. Yet, we have been able to develop compositional
techniques that allow a network to be analyzed as a single server. Our
technique decouples the service guarantee provided by a network from
source traffic characterization, and derives tight performance bounds
for heterogeneous networks.
Operating System Support at End-Stations
Most conventional CPU scheduling algorithms have been designed for specific application classes (e.g., earliest
deadline first and rate monotonic algorithms for hard real-time applications, time-sharing for best-effort
applications). A general purpose computing environment, however, will need to support applications with different
service requirements. We have developed a hierarchical CPU scheduling framework that enables different schedulers to
be employed for different application classes, while protecting application classes from one another. In this
framework, the hierarchical partitioning is specified by a tree. Each thread in the system belongs to exactly one
leaf node, and each node in the tree represents either an application class or an aggregation of application classes.
Whereas the requirements of application classes determine the leaf node schedulers, intermediate nodes are scheduled
by SFQ. This approach provides throughput guarantees to each application class, and does not impose higher overhead
than conventional time-sharing schedulers.
We are utilizing the distributed computing infrastructure resulting
from our research to design InfoWeave, a visual environment for
creation and dissemination of digital educational material. InfoWeave
will greatly enhance the process of delivering educational material by
utilizing video as the primary medium for delivering educational
material, and using the associated textual material to search for
specific video segments. In InfoWeave, the educational material will
be organized in terms of modules, each encapsulating information on a
specific topic at a certain level of detail. To help a user navigate
through this repository, InfoWeave will utilize links to specify
relationships between data items within modules. Links will be applied
either by publication-time compilation of the links into application's
hypertext format, or via browse-time user queries on keywords,
phrases, sections, etc. By supporting various types of links,
InfoWeave will support both reader-driven and author-driven
navigation. Moreover, it will allow users to control their own view of
the information. Using this navigation facility, InfoWeave will
support mechanisms to create lectures, courses, or entire curricula
(possibly inter-disciplinary) by composing together a set of modules.
In summary, we envision that the next generation Internet will support
complex distributed applications that will involve storing,
transmitting, processing, and visualizing large volumes of
heterogeneous data. At the core of these applications lies fertile
ground for research and development, ranging from design of algorithms
to network and operating systems design, and from tools and
methodologies for application development to human-computer
interfaces. Our research is driven by two primary objectives: (1)
development of efficient techniques for resource
management -- including computing, communication, and storage
resources -- that satisfy the requirements of these applications; and
(2) evaluation of these techniques in realistic applications. Over the
past few years, we have addressed some basic resource management
problems, but many more remain in designing an end-to-end architecture
for the next generation of distributed applications. We plan to
address these problems and then apply our architecture to develop next
generation data acquisition, analysis, and visualization applications
in the medical and scientific domains.