Objectives

Table of Contents:

Main Objectives
Driving Applications
Unity Among Core Projects
Coherence Among Exploratory Projects

Main Objectives

Our main objective is to conduct the basic and experimental research necessary to address the problems that lie at the core of these emerging applications. Since this involves a large number of projects--each with its own requirements, time line, and risk-- we must carefully organize and manage the projects in order to realize and exploit their inherent synergy, yet avoid deadlocks. In this proposal, we present our plans for organizing, scheduling, and managing our projects to accomplish our objective, and identify existing as well as potential synergies among them.

We propose projects at three levels (see figure): (1) core projects that will create an infrastructure for storing, accessing, and transporting large, heterogeneous information objects; (2) exploratory projects that will develop a variety of application-specific and general purpose languages and tools for distributed applications; and (3) realistic and ambitious applications from each of the three classes described above.

Figure 1

The strengths of our research include unity among the core projects, coherence among the exploratory projects, and scale of the applications. In what follows, we first describe the requirements of the applications that drive the overall research, the common theme that unites the core projects, and the extent of support technologies provided by our exploratory projects.

Driving Applications

As a testbed for our research, we are already developing six major applications (shown in the top tier of Figure 1) that span the three classes we described above. These include a Class I NASA-funded project to develop a multiresolution storage and transport system for the massive multispectral image databases generated by the Earth Observing System (EOSDIS); a Class II NSF-sponsored project to develop a distributed system supporting the storage and analysis of massive data from high energy physics (NILE); another Class II project in the Texas Institute of Computational Applied Mathematics (TICAM) to develop a distributed system for solving very large systems of Partial Differential Equations; and three projects in class III with significant real-time constraints. The latter group includes an ARPA-sponsored project to construct a large-scale distributed Synthetic Theater of War (STOW) being undertaken at the UT Applied Research Laboratory; a distributed system for hypothesis testing in active-vision systems using qualitative reasoning (MIMIC); and an ONR-sponsored project to develop a proactive real-time computing environment (SARTOR). Due to space limitations, we provide a further description of only two of these projects: EOSDIS and STOW. For more information on the others, please see our original proposal.

The goal of the UT project in support of EOSDIS is to develop a multiresolution storage and transport system for data acquired from NASA's Earth Observing System. This system must support efficient storage, retrieval and dissemination of multi-dimensional satellite imagery, along with ancillary textual and numeric metadata. The primary data is expected to reach volumes of two terabytes per day when EOSDIS becomes fully operational. Queries often involve on-line access to several terabytes of data stored across many distributed information sources. Typical queries may be difficult to express in a traditional relational query language such as SQL. In many cases, multiresolution analysis techniques will be performed on the data obtained through these queries. Thus, this application provides an ideal opportunity to develop and evaluate multiresolution databases, along with efficient and reliable storage management strategies and network protocols.

The goal of the STOW project is to develop a real-time distributed battlefield simulation comprising 10,000 combatants. Fewer than 10% of the combatants will be humans; the rest will be simulation models. The network supporting the simulation will span North America, and it must meet a stringent requirement: every object affected by a simulated action must be notified within 100 milliseconds. STOW's environmental databases include 3-D shapes and the images to be mapped onto each surface. Since graphics hardware restricts the amount of data that can be displayed in real-time, the data sent to each observer will vary in resolution. For example, an airborne observer has a distant horizon but requires little detail for the objects on the ground, while a ground-based observer has a close horizon, but requires considerable detail for objects within the horizon. The database system must support both observers with consistent data at different resolutions. Clearly, STOW's strict real-time requirement puts a premium on efficient retrieval and transmission of multiresolution objects. Moreover, the construction of a wide variety of simulated objects that meet the real-time constraints provide a significant challenge for developing large-scale software.

Unity Among Core Projects

All of these applications share a common requirement for storing, retrieving, and transporting information in multiple formats (e.g., textual and numeric data, images, audio, video, animation sequences, etc.--which we collectively refer to as multimedia). Multimedia objects significantly differ in characteristics (e.g., size, format, tolerance to errors, real-time requirements) from textual and numeric data. For example, each satellite image in our EOSDIS application requires about 60 MBytes of storage space, and typical video playback requires access, transmission, and display of 30 frames/second. Even for applications that primarily access textual and numeric data, the sheer volume of data has increased significantly (e.g., terabytes of numeric data in the NILE project). Therefore, these applications require substantially different data management techniques.

Our core projects, shown in the bottom layer of Figure 1, develop an integrated architecture that supports: (1) techniques for efficiently storing multimedia data on disk arrays (multimedia storage server project), (2) mechanisms for locating and accessing information objects from distributed databases (multiresolution DBMS project), and (3) protocols for providing performance guarantees to each object transmitted over the network (network architecture and protocols project). Developing an architecture that achieves the above objective will involve close interaction among the three projects on a variety of issues, including:

Multiresolution: To address the heterogeneity of future computing and communication infrastructures (ranging from hand-held devices to powerful workstations, and from ethernets and token-rings to high-speed networks such as ATM and FDDI) as well as the scalability requirements of applications (e.g., managing terabytes/day of satellite imagery in the EOSDIS project), our architecture will support storage, access, and transport of multiple resolutions (i.e., levels of detail) of information objects. This requires that: (1) servers provide efficient access to multiple resolution levels of each information object (or at the very least, provide mechanisms for deriving desired resolution levels from full-resolution objects); (2) databases support the concept of imprecise computation and progressive refinement (i.e., instead of a well-defined result, queries can elicit responses, each of which approximates the final result of the query and improves upon the last response); and (3) networks implement protocols for efficiently transmitting information objects at the chosen resolution levels.
Performance: To meet the interactivity and performance requirements of applications, we propose to develop techniques for providing end-to-end quality of service (QoS) guarantees, including: (1) techniques for resource reservation at the storage server and the network to provide guarantees of bandwidth, end-to-end delay, delay jitter, etc.; and (2) query language extensions for specifying resolution/performance tradeoffs and a framework for query optimization that can meet those specifications. Having developed these techniques, we will also design protocols for negotiating and appropriately partitioning the QoS requirements of applications among the various components of the systems architecture (i.e., disk, I/O, and network subsystems).

Coherence among Exploratory Projects

The application projects and the core projects are, by necessity, large, complex systems. To help build such systems, we are exploring a variety of new languages, runtime environments, and methodologies, as shown in the middle tier of Figure 1. Although these projects have individual merit, they would be substantially strengthened through synergy with applications, core projects, and other system-development technologies. We illustrate this synergy with five examples.

First, Batory's P2 system compiles lightweight database systems (LDB's). An LDB differs from a general-purpose database system in two ways: it supports only one schema with a small set of predefined retrieval and update operations, but it offers substantially better performance. LDB's are ubiquitous. Examples from our core projects include the routing table for the network manager and the lock manager for the database system, and an example from our application projects is the synthetic-environments database used in STOW. We are currently using P2 to develop a prototype of STOW's terrain database. Moreover, we plan to use it later to build the core multiresolution database system that will support STOW, and other applications, at full production levels.

Second, Wilson's TEXAS system is a persistent-storage memory manager for C++ programs. We are currently evaluating TEXAS (as an alternative to an extensible DBMS, such as POSTGRES) for prototyping our core system for storing multiresolution data. The TEXAS system falls short of our needs in one regard: it does not support distributed programs that share memory. However, Wilson's current research--which is aimed at supporting the STOW application--is addressing this problem. Wilson plans to develop the distributed persistent store using P2.

Third, Miranker's VENUS compiler for rule-based programs--which has achieved speed-ups of 2 to 3 orders of magnitude over traditional approaches for sequential rule processing--is being used to develop simulation models for STOW. Previous models for event-driven battlefield simulations were coded with imperative languages (e.g. C++, Ada). As an initial test of VENUS, Miranker reimplemented a component of a battlefield simulator used in a distributed field-artillery command and control system, and achieved adequate performance with an order of magnitude decrease in code size compared with the imperative code. Meeting the real-time requirements of class 3 applications, such as STOW, drive Miranker's current rule-technology research.

Misra's Seuss project is developing a language and methodology for multiprogramming. Two major goals of this project are: (1) to design and understand the modules (e.g., processes or data objects) of a program in isolation, without considerations of interference by the other modules, and (2) to implement the modules on separate processors with a fine grain of interleaving so that no processor is ever locked out of accessing common data for long periods of time. We are identifying components of the core systems to implement with Seuss and with conventional methods. These points of comparison will give the Seuss project important data on development time and performance.

Finally, the research by Porter and Mooney aims to improve query processing to help database users find the information they seek. Currently, query interpreters are quite rigid; even small mismatches between the terms of a query and the relations of the database can cause queries to fail. Moreover, queries must be expressed in formal languages (e.g. SQL). This research will develop methods for efficiently building the knowledge bases and natural-language parsers required for more flexible processing of database queries. This research requires large samples of realistic database queries, some of which fail due to rigidity in query processing. The applications (primarily EOSDIS) will provide this data, along with a testbed for the advanced query interpreter.

Exploiting the synergy inherent among the projects requires a management plan that focuses effort on common objectives and a shared (thus, large) computational infrastructure.

Questions or comments to <cise@cs.utexas.edu>