Who We Are

Table of Contents:

Overview of Research

It is not difficult to predict the overall shape of the computing landscape ahead: millions of powerful inexpensive machines, as ubiquitous as the telephones and televisions of today, will be used by lay people to connect with each other and with information servers to communicate text, audio and video in real time. Applications of computing will become far more ambitious as the power of the machines and the bandwidth of the communication links increase many fold.

We foresee the emergence of three classes of applications, classified by their resource needs. The first class is information retrieval applications, such as digital libraries. These applications will support browsing of large-scale, distributed databases of multimedia information. The second class of applications adds the requirement that the multimedia information be processed, not simply browsed. For example, a digital library of satellite imagery might be processed by programs for feature extraction or visualization. Finally, the third class of applications adds the requirement of real-time interactivity. For example, a virtual reality simulation of fire fighting might require distributed simulation and user interactivity, subject to real-time constraints.

As computer scientists, our challenge is to conduct the basic and experimental research necessary to address the problems that lie at the core of these emerging applications. Specifically, we propose to develop: (1) a multiresolution information archival and management system, (2) network protocols and architectures for efficient transport of multimedia information over networks, and (3) software tools for efficient development of applications, including support for developing parallel and distributed systems.

Multiresolution information archival and management system

The traditional techniques of data storage and retrieval are designed for text and numbers. The applications that we envision will additionally deal with images, audio and video. These objects require new techniques for disk utilization because they differ substantially from traditional data objects in size, format, and compressibility. In addition, some multimedia data objects (such as video) require continuous, real-time transfer. Whereas traditional databases are structured around records or tuples where any tuple could potentially be accessed at any step in the computation, a video database has a more constrained access style and, hence, there is the possibility of more efficient organization on a storage medium. Also, new techniques are needed for servicing multiple clients under real time requirements.

A promising new technique for handling the enormous volume of data, in video for instance, is multiresolution. Video images may be stored at multiple levels of resolution and data may be accessed at varying levels of detail depending on the application. An appropriate amount of data need be transmitted for a desired level of fidelity. The methods for extracting low-resolution data from the high-resolution ones will be a key issue in this work. We plan to study the problems of organizing such a database and retrieving from it effectively.

Finally, the proposed applications will require basic advances in query processing to help users find the information they seek. Currently, query interpreters are quite rigid; even small mismatches between the terms of a query and the relations of the database can cause queries to fail. Although content-based accessing can use knowledge bases of domain terms to reformulate queries, efficiently building these knowledge bases is problematic. Similarly, query interpreters would be less rigid if they processed natural language. Although natural-language parsers have been built for portions of English, current methods are extremely inefficient and error-prone. Our research will be directed toward developing methods for efficiently building the knowledge bases and parsers required for more flexible query processing.

Architectures and protocols for multimedia communication

The work on protocol design has, traditionally, been based on the networking aspects of the problem and not on the applications that use the transmitted information. The multimedia objects that will be used in our proposed applications are vast in size and have stringent real time requirements. Therefore, protocols will have to exploit knowledge of the application itself. Our major effort will be in developing protocols for transmitting high resolution multimedia objects over high speed networks (different protocols may be appropriate for wireless networks and low resolution displays). We plan to exploit multiresolution storage of data (see last subsection) in this work. Additionally, we will develop formal methods for specification and verification of these protocols.

We will also evaluate the scalability of our transport protocols as well as the ATM switching hardware empirically. Whereas the former task will assist us in refining the design and implementation of transport protocols, the latter will be utilized to develop a switching architecture that could replace an electrical switch with an optical one to achieve much higher communication rates.

Tools for distributed application development

Twenty-five years of experience with parallel and distributed programming has not produced a programming model for concurrent computation that is as successful as the random access machine for sequential computation. It is generally agreed that no single extant model suffices for all forms of parallel computation, or for all types of parallel machines. Abstractions, such as a global address space, simplify parallel programming but typically extract a performance penalty by hiding data locality and mechanisms for concurrency control. On the other hand, models with explicit message passing, which expose these issues to programmers, allow performance to be tuned, but are difficult to program. Thus, our options are reduced to the following few: (1) produce a new general programming model that can ease programming without loss of efficiency, (2) augment existing models with tools that aid the programmer to produce correct and efficient code, and (3) resort to different concurrent programming models for different applications.

We are addressing all of these options. In particular, we are exploring novel approaches to general-purpose distributed computing in a cluster environment with new languages and tools, and support for system-level services that simplify programming. We are also investigating application-specific approaches that have already proven to be powerful in a variety of domains. The proposed facility will provide an ideal testbed for the variety of efforts. Indeed, the pursuit of a number of approaches on a common testbed will permit a comparative assessment of the strengths and weaknesses of each approach.

Testbeds and Applications

To maintain our focus on the important and difficult issues in designing large-scale systems, we propose to drive our research with four realistic and ambitious applications, each of which is independently funded and ongoing. First, we are starting to collaborate with researchers from the UT Austin Center for Space Research to design and implement a multiresolution archival and analysis system for NASA's Earth Observing System Data and Information System (EOSDIS). Second, we plan to work with researchers in TICAM-- a recently established interdisciplinary program in computational and applied mathematics at UT Austin--to develop a parallel and distributed environment for execution and visualization of partial differential equations. Third, we propose to collaborate with the researchers in the CLEO high energy physics project to evaluate the scalability of our system architecture; CLEO is a national challenge project being developed jointly at Univ. of Florida, Cornell, Univ. of Texas and Univ. of California at San Diego. Finally, we expect to work with the researchers at the UT Austin Applied Research Laboratory to design and implement real-time, proactive systems within our infrastructure. In the classification scheme mentioned earlier, the information management system for EOSDIS is a Class I application, applications in TICAM and CLEO belong to Class II, and the real-time, proactive systems are Class III applications.

Questions or comments to <cise@cs.utexas.edu>