Applications

Table of Contents:

Applications

As computer scientists, our challenge is to conduct the basic and experimental research necessary to address the problems that lie at the core of emerging applications of computing. The previous sections described our research, and this section describes four realistic and ambitious applications, each of which is independently funded and ongoing. By working closely with these applications, our research will maintain focus on the important and difficult issues in designing large-scale systems.

Class I: Multiresolution Database for EOSDIS (Vin, Fussell, Silberschatz)

The explosive growth in data acquisition technology, such as satellites, has yielded an unprecedented volume of data, such as the imagery collected by NASA's Earth Observing System (EOS). Consequently, unlike conventional database systems, the information management systems required for this data must provide on-line access to several terabytes of data stored across a large number of distributed information sources. Furthermore, in addition to supporting conventional one dimensional data, new systems must support: (1) two, three, and four dimensional information often stored as imagery; (2) data stored, displayed, and possibly analyzed as multimedia; and (3) textual and numeric data in disparate formats.

The NASA Earth Observing System Data and Information System (EOSDIS) is an archtypical example of such a large, distributed, heterogenous information management system. EOSDIS must integrate new data into databases, process it to produce various levels of results, and disseminate data to users and archival sites worldwide in short periods of time. This process will continue for an extended period while missions are launched, new models are developed, and changes are incorporated to reflect updated calibration information and possible modifications in processing. Furthermore, throughout its operation, EOSDIS will be accessed by a wide variety of users, ranging from researchers familiar with the data to individuals who are neither domain experts nor experienced at obtaining information from distributed sources over a network. Since the ultimate effectiveness of EOS depends as much on the analysis of acquired data as on the successful launch of individual missions, designing and implementing an information management system that efficiently archives, retrieves, and processes the information from each mission is crucial to its success.

The Center for Space Research (CSR) at UT Austin is involved with several remote sensing research programs that are funded by NSF, NASA, DoD, and the State of Texas. Consequently, we have access to a wide variety of databases with which to experiment. We have selected three of these databases as testbeds for our research. One of these will be an existing data set for the dune fields of the Great Victoria Desert of Australia. This database is unique in that it represents a multiresolution, multi-sensor data set acquired over an extended time history. Imagery from all the sensors on commercial satellite platforms are represented in the archive, as well as radar altimetry, aerial photography and survey data. This data set is ideal for the initial stage of the project because (a) extensive imagery is available for developing the multiresolution database, (b) the features are fairly smooth, and (c) the ground cover varies spectrally and texturally over a relatively small area; thereby providing an interesting set of image processing research issues. These data are also representative of an ecological system where mapping of the surface features at different resolutions is extremely important for studying the local ecology. Approaches which are successful for 3-D mapping of this area could be applied to a wide range of other problems.

Class II: Parallel Solution of PDEs (Browne, van de Geijn)

Literally all of physics, chemistry, and engineering build upon the solution of different families of partial differential equations. A relatively small number of methods (finite differences, finite element, finite volume, etc.) apply to PDE's, and the algorithms for these methods fall into a small number of classes. Ultimately, all of these methods reduce to array computations. Recently, the realization that locally adaptive solutions can converge much more rapidly than solutions based on uniform approximations is driving the applications towards adaptive methods and algorithms. The major obstacle to progress in this direction is the extreme difficulty of developing efficient solutions using the parallel hardware and software available today.

The research challenge is to find common abstractions that help develop adaptive methods and algorithms in distributed, parallel execution environments. One such abstraction, a Scalable Distributed Dynamic Array (SDDA), generalizes a standard static array. The resulting data structure adapts at runtime to the size and shape of the entity it is representing and is transparently (and dynamically) distributed across parallel execution resources. Currenty we are defining and implementing various SDDA operations, and we plan to extend them to include operations for visualization and interactivity.

As part of TICAM, a new interdisciplinary research program in computational and applied mathematics at UT Austin, a prototype implementation of the SDDA concept is already well under way. Implementation of highly efficient computational parallel linear algebra in terms of the abstractions of the SDDA involves filtering, moving and merging very high volumes of data across networks. Three TICAM projects from different domains and using different methods (Finite Element, Finite Difference, and Fast Multipole) demonstrate the utility of the SDDA abstraction: hp-adaptive computational fluid dynamics (an ARPA funded project), computational analysis of binary black holes (an NSF Grand Challenge Project), and a recently funded NSF project on the application of high performance computational methods to micromechanical characteristics of composite materials. The last of the three projects would especially benefit from the proposed infrastructure. The project's goals are to explore the use of parallel computing, including clusters of workstations, to analyze composites, and to create a toolbox to help computational scientists design composite materials.

Class II: The CLEO Experiment (Ricciardi, Ogg)

CLEO is a High Energy Physics (HEP) experiment at Cornell's Electron Storage Ring accelerator facility. CLEO uses electron-positron annihilation to study the b-quark, a fundamental constituent of matter. The basic unit of information for HEP is a record of the particles produced in an event, a collision in the accelerator. Each event is independent of all others, and therein lies both the challenge and opportunity in HEP computing. Because events are independent, physical theories can be tested only by statistical means, requiring many events and substantial computation. For example, a typical experiment might produce 106 - 1012 events, and (depending on the physics being studied and the type of particle detector used) each event generates 1 kB - 1 MB of data and requires milliseconds to hundreds of seconds to analyze. Although event independence creates enormous storage and computational needs, it also provides a mitigating benefit: events can be processed autonomously and asynchronously. However, doing so raises technical and theoretical challenges, including fault-tolerant design, reactive processing, and scalability.

There are three primary computations on events. First, event reconstruction interprets raw, detector-specific data as physically meaningful quantities. The CPU requirements for event reconstruction are substantial, yet the I/O requirements are modest. Second, event analysis measures physical phenomena and searches for new physical insights by processing the reconstructed data. Event analysis is almost always I/O bound. Finally, event simulation interprets measurements as physical processes. A large sample of simulated events (at least as large as the number of non-trivial events in the actual data) is necessary to evaluate the background of an interesting process. Event simulation is highly CPU bound.

In each computation, there are three distinct phases: job initialization (e.g., setup, reserving histograms, global calibration), event-by-event processing (reconstruction, analysis and simulation), and job termination (e.g., collating statistics, preparing graphical output). There may also be intermediate tasks, such as starting a new run, processing calibration records, and so forth. To parallelize computation, event-by-event processing is farmed out to various nodes; the executable image itself is not a parallel program, rather it is run in parallel on different processors. Event processing may generate statistics. Whether they are returned to the controlling job immediately, or accumulated, collated, and returned when the local job terminates is an architectural decision.

The CLEO experiment is an NSF-funded National Challenge Project, ``Distributed Computing and Databases for High Energy Physics'' (NSF grant 9318151). The Project's goal is to develop fault-tolerant, wide-area systems for scientific applications with a large and distributed database; its collaborators include researchers at Cornell University, the University of Florida, the University of California at San Diego, and UT Austin. At UT Austin, Ricciardi and Ogg are building a small, dedicated test facility to experiment with communication primitives and architectural models. The results of their work will influence the design of the full-scale production system, which will be developed at Cornell and Florida. The proposed infrastructure facility will benefit the National Challenge Project in three ways:

1.
Scalability. Because the final architecture is intended to transfer to other scientific and engineering applications, scalability is a critical aspect of the Project's design. However, scalability is difficult to study within the confines of the Project because the dedicated facility at UT Austin is necessarily small, and the production facilities at Cornell and Florida, limited by their budgets, will only be as large as needed for the CLEO experiment. The facility we propose to build can be used to study some aspects of the scalability of the CLEO experiment because the facility will be simultaneously used by other projects.
2.
ATM. ATM switches play an essential role in the project, both for local and wide area networks. In time, there will be ATM switches at Cornell and Florida, but not at UT Austin (at least, not through the Project's funding). However, Ricciardi and Ogg's work would benefit from testing architectural models and transport protocols with an ``in house" switch, rather than using the production environment's switch. Moreover, an important part of the project's work will be testing the inter-operability of different switches. In all likelihood, there will only be one switch at each of Cornell and Florida. The availability of more than one switch (possibly from different vendors) at a single site will be enormously useful in developing and testing architectures, protocols and inter-operability.
3.
Other Scientific Applications. An important goal of the Project is to produce a system design that transfers to other applications. Sharing a computing facility with other researchers working on similar applications will undoubtedly influence the system's design at an early stage. In particular, we expect substantive collaboration and sharing of ideas between this Project and the highly similar EOSDIS project.

Class III: Real-Time Proactive Systems (Fussell, Miranker, Mok, Kuipers)

An increasingly important task of computers will be monitoring and controlling the external environment. Computers used in this way are called real-time embedded systems. These systems are ``reactive'' in the sense that they are programmed to respond to signals and data from the external world. With recent advances in audio and video technology, applications are emerging which require a capability to function in environments rich with sensors and data. In such environments, the enormous volume of data poses two requirements: data must be processed selectively, and the processing algorithms must trade-off quality (precision, accuracy) with resource requirements. The resulting systems, called ``proactive systems'', are truly resource-limited; their computation depends on both the allocation of limited resources and the fact that the value of information can decay with time. Several projects in our department attempt to meet these challenges by computing a resource budget, then selecting data and processing methods appropriate for it.

One project requiring real-time data acquisition, processing, and dissemination is a tool that coordinates the fighting of forest fires or other distributed, tactical tasks. In the future, individual fire-fighters may carry multimedia communication devices that present a plethora of data gathered from surveillance aircraft, weather satellites, digital terrain maps, and command-and-control centers. For this data to be useful, it must be disseminated quickly, and relevant conclusions (such as, ``the fire poses an imminent danger because of the current weather pattern and terrain'') must be extracted easily. Methods for selecting data, and functions for processing it to achieve coordination in a distributed environment, are being investigated by Fussell and by a team led by Browne. A rule-based programming environment (Venus) has been implemented at the UT Applied Research Laboratory; it awaits testing in a distributed testbed, such as the proposed infrastructure.

Another project is the real-time performance evaluation of distributed, tactical systems such as the Aegis system on Navy ships. As is typical of military systems, command decisions must be made using real-time information of varying quality. An effective system must extract the most relevant information from voluminous sensor data, and other information resources, distributed throughout a ship. The introduction of multimedia data complicates the problem further because of their CPU-intensive processing algorithms. In terms of research challenges, the resource scheduling models for these new applications are sufficiently novel that past work (e.g., real-time scheduling theory) may no longer apply. A simple example is the abstraction of constant context switching overhead in OS process management; although this abstraction is commonly assumed in standard task scheduling models (e.g., the Liu and Layland periodic task model), it does not hold for the scheduling of I/O tasks involving video image retrieval. More generally, it remains to be determined what models are appropriate for new execution environments, such as the proposed infrastructure. Our work in this area will be strengthened by the opportunity to experiment with applications using the infrastructure. These experiments will help define correct abstractions for evaluating real-time performance and set new directions of research. We have ongoing work in this area in collaboration with the UT Applied Research Laboratory.

A third project is the MIMIC approach to situation monitoring (Kuipers). In this approach, multiple hypotheses are tracked in parallel, using methods in qualitative and semi-quantitative reasoning to cover the hypothesis space with a tractable number of models. These methods will require (1) the ability to track large numbers of observational data streams in real time, (2) the ability to maintain and update tracker hypotheses in parallel, and (3) the ability to propose reasonable actions in real time based on the changing state of incomplete knowledge represented by the current set of trackers. The proposed infrastructure will be valuable as a testbed for these experiments.

Questions or comments to <cise@cs.utexas.edu>