Machine Learning Papers and Abstracts

To view a paper, click on the ps image (for gzipped postscript file) or pdf image (for pdf file).

Book Recommending Using Text Categorization with Extracted Information
Raymond J. Mooney, Paul N. Bennett and Loriene Roy
To appear in the AAAI-98/ICML-98 Workshop on Learning for Text Categorization and the AAAI-98 Workshop on Recommender Systems, Madison, WI, July 1998.

Content-based recommender systems suggest documents, items, and services to users based on learning a profile of the user from rated examples containing information about the given items. Text categorization methods are very useful for this task but generally rely on unstructured text. We have developed a book-recommending system that utilizes semi-structured information about items gathered from the web using simple information extraction techniques. Initial experimental results demonstrate that this approach can produce fairly accurate recommendations.
Text Categorization Through Probabilistic Learning: Applications to Recommender Systems
Paul N. Bennett
Undergraduate Honor Thesis, Department of Computer Sciences, University of Texas at Austin, May 1998.
Also appears as AI TR 98-270

With the growth of the World Wide Web, recommender systems have received an increasing amount of attention. Many recommender systems in use today are based on collaborative filtering. This project has focused on LIBRA, a content-based book recommending system. By utilizing text categorization methods and the information available for each book, the system determines a user profile which is used as the basis of recommendations made to the user. Instead of the bag-of-words approach used in many other statistical text categorization approaches, LIBRA parses each text sample into a semi-structured representation. We have used standard Machine Learning techniques to analyze the performance of several algorithms on this learning task. In addition, we analyze the utility of several methods of feature construction and selection (i.e. methods of choosing the representation of an item that the learning algorithm actually uses). After analyzing the system we conclude that good recommendations are produced after a relatively small number of training examples. We also conclude that the feature selection method tested does not improve the performance of these algorithms in any systematic way, though the results indicate other feature selection methods may prove useful. Feature construction, however, while not providing a large increase in performance with the particular construction methods used here, holds promise of providing performance improvements for the algorithms investigated. This text assumes only minor familiarity with concepts of artificial intelligence and should be readable by the upper division computer science undergraduate familiar with basic concepts of probability theory and set theory.
Theory Refinement of Bayesian Networks with Hidden Variables
Sowmya Ramachandran and Raymond J. Mooney
To appear in the Fifteenth International Conference on Machine learning, 1998.

While there has been a growing interest in the problem of learning Bayesian networks from data, no technique exists for learning or revising Bayesian networks with Hidden variables (i.e. variables not represented in the data), that has been shown to be efficient, effective, and scalable through evaluation on real data. The few techniques that exist for revising such networks perform a blind search through a large spaces of revisons, and are therefore computationally expensive. This paper presents BANNER, a technique for using data to revise a given bayesian network with noisy-or and noisy-and nodes, to improve its classification accuracy. The initial network can be derived directly from a logical theory expresssed as propositional rules. BANNER can revise networks with hidden variables, and add hidden variables when necessary. Unlike previous approaches, BANNER employs mechanisms similar to logical theory refinement techniques for using the data to focus the search for effective modifications. Experiments on real-world problems in the domain of molecular biology demonstrate that BANNER can effectively revise fairly large networks to significantly improve their accuracies.
An Experimental Comparison of Genetic Programming and Inductive Logic Programming on Learning Recursive List Functions
Lappoon R. Tang, Mary Elaine Califf, Raymond J. Mooney
TR AI98-271, Artificial Intelligence Lab, University of Texas at Austin, May 1998.

This paper experimentally compares three approaches to program induction: inductive logic programming (ILP), genetic programming (GP), and genetic logic programming (GLP) (a variant of GP for inducing Prolog programs). Each of these methods was used to induce four simple, recursive, list-manipulation functions. The results indicate that ILP is the most likely to induce a correct program from small sets of random examples, while GP is generally less accurate. GLP performs the worst, and is rarely able to induce a correct program. Interpretations of these results in terms of differences in search methods and inductive biases are presented.
Advantages of Decision Lists and Implicit Negatives in Inductive Logic Programming
Mary Elaine Califf and Raymond J. Mooney
To appear in Journal of New Generations Computing, 16, 3, (1998).

This paper demonstrates the capabilities of FOIDL, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completeness assumption as a substitute for negative examples, and the use of intensional background knowledge. The development of FOIDL was originally motivated by the problem of learning to generate the past tense of English verbs; however, this paper demonstrates its superior performance on two different sets of benchmark ILP problems. Tests on the finite element mesh design problem show that FOIDL's decision lists enable it to produce generally more accurate results than a range of methods previously applied to this problem. Tests with a selection of list-processing problems from Bratko's introductory Prolog text demonstrate that the combination of implicit negatives and intensionality allow FOIDL to learn correct programs from far fewer examples than FOIL.
Relational Learning of Pattern-Match Rules for Information Extraction
Mary Elaine Califf and Raymond J. Mooney
Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Standford, CA, March, 1998.

Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. This paper presents a system, RAPIER, that takes pairs of sample documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and acquires unbounded patterns that include constraints on the words, part-of-speech tags, and semantic classes present in the filler and the surrounding text. We present encouraging experimental results on several domains, including Internet job postings.
Semantic Lexicon Acquisition for Learning Natural Language Interfaces
Cynthia A. Thompson and Raymond J. Mooney
TR AI98-273, Artificial Intelligence Lab, University of Texas at Austin, May 1998.

This paper describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with meaning representations. WOLFIE is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The lexicons learned by WOLFIE are compared to those acquired by a competing system developed by Siskind (1996).
Using Multi-Strategy Learning to Improve Planning Efficiency and Quality
Tara A. Estlin
Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, May 1998.
Also appears as TR AI98-269

Artificial intelligence planning systems have become an important tool for automating a wide variety of tasks. However, even the most current planning algorithms suffer from two major problems. First, they often require infeasible amounts of computation time to solve problems in most domains. And second, they are not guaranteed to return the best solution to a planning problem, and in fact can sometimes return very low-quality solutions. One way to address these problems is to provide a planning system with domain-specific control knowledge, which helps guide the planner towards more promising search paths. Machine learning techniques enable a planning system to automatically acquire search-control knowledge for different applications. A considerable amount of planning and learning research has been devoted to acquiring rules that improve planning efficiency, also known as speedup learning. Much less work has been down in learning knowledge to improve the quality of plans, even though this is an essential feature for many real-world planning systems. Furthermore, even less research has been done in acquiring control knowledge to improve both these metrics.
The learning system presented in this dissertation, called SCOPE, is a unique approach to learning control knowledge for planning. SCOPE learns domain-specific control rules for a planner that improve both planning efficiency and plan quality, and it is one of the few systems that can learn control knowledge for partial-order planning. SCOPE's architecture integrates explanation-based learning (EBL) with techniques from inductive logic programming. Specifically, EBL is used to constrain an inductive search for control heuristics that help a planner choose between competing plan refinements. Since SCOPE uses a very flexible training approach, its learning algorithm can be easily focused to prefer search paths that are better for particular evaluation metrics. SCOPE is extensively tested on several planning domains, including a logistics transportation domain and a production manufacturing domain. In these tests, it is shown to significantly improve both planning efficiency and quality and is shown to be more robust than a competing approach.
Relational Learning of Pattern-Match Rules for Information Extraction
Mary Elaine Califf and Raymond J. Mooney
Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Standford, CA, March, 1998.

Information extraction is a form of shallow text processing which locates a specified set of relevant items in natural language documents. Such systems can be useful, but require domain-specific knowledge and rules, and are time-consuming and difficult to build by hand, making infomation extraction a good testbed for the application of machine learning techniques to natural language processing. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc.jobs.offered.
Learning Parse and Translation Decisions From Examples With Rich Context
Ulf Hermjakob
Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, May 1997.
175 pages. Also appears as TR 97-12.

The parsing of unrestricted text, with its enormous lexical and structural ambiguity, still poses a great challenge in natural language processing. The difficulties with traditional approaches, which try to master the complexity of parse grammars with hand-crafted rules, have led to a trend towards more empirical techniques.
We therefore propose a system for parsing and translating natural language that learns from examples and uses some background knowledge.
As our parsing model we choose a deterministic shift-reduce type parser that integrates part-of-speech tagging and syntactic and semantic processing, which not only makes parsing very efficient, but also assures transparency during the supervised example acquisition.
Applying machine learning techniques, the system uses parse action examples to generate a parser in the form of a decision structure, a generalization of decision trees.
To learn good parsing and translation decisions, our system relies heavily on context, as encoded in currently 205 features describing the morphological, syntactical and semantical aspects of a given parse state. Compared with recent probabilistic systems that were trained on 40,000 sentences, our system relies on more background knowledge and a deeper analysis, but radically fewer examples, currently 256 sentences.
We test our parser on lexically limited sentences from the Wall Street Journal and achieve accuracy rates of 89.8% for labeled precision, 98.4% for part of speech tagging and 56.3% of test sentences without any crossing brackets. Machine translations of 32 Wall Street Journal sentences to German have been evaluated by 10 bilingual volunteers and been graded as 2.4 on a 1.0 (best) to 6.0 (worst) scale for both grammatical correctness and meaning preservation. The translation quality was only minimally better (2.2) when starting each translation with the correct parse tree, which indicates that the parser is quite robust and that its errors have only a moderate impact on final trans- lation quality. These parsing and translation results already compare well with other systems and, given the relatively small training set and amount of overall knowledge used so far, the results suggest that our system Contex can break previous accuracy ceilings when scaled up further.
Integrating Abduction and Induction in Machine Learning
Raymond J. Mooney
To appear in the Working Notes of the IJCAI-97 Workshop on Abduction and Induction in AI

This paper discusses the integration of traditional abductive and inductive reasoning methods in the development of machine learning systems. In particular, the paper discusses our recent work in two areas: 1) The use of traditional abductive methods to propose revisions during theory refinement, where an existing knowledge base is modified to make it consistent with a set of empirical data; and 2) The use of inductive learning methods to automatically acquire from examples a diagnostic knowledge base used for abductive reasoning.
Learning to Parse Natural Language Database Queries into Logical Form
Cynthia A. Thompson Raymond J. Mooney, and Lappoon R. Tang
To appear in the Proceedings of the ML-97 Workshop on Automata Induction, Grammatical Inference, and Language Acquisition.

For most natural language processing tasks, a parser that maps sentences into a semantic representation is significantly more useful than a grammar or automata that simply recognizes syntactically well-formed strings. This paper reviews our work on using inductive logic programming methods to learn deterministic shift-reduce parsers that translate natural language into a semantic representation. We focus on the task of mapping database queries directly into executable logical form. An overview of the system is presented followed by recent experimental results on corpora of Spanish geography queries and English job-search queries.
Parameter Revision Techniques for Bayesian Networks with Hidden Variables: An Experimental Comparison
Sowmya Ramachandran and Raymond J. Mooney
Submitted for Review

Learning Bayesian networks inductively in the presence of hidden variables is still an open problem. Even the simpler task of learning just the conditional probabilities on a Bayesian network with hidden variables is not completely solved. In this paper, we present an approach that learns the parameters of a Bayesian network composed of noisy-or and noisy-and nodes by using a gradient descent back-propagation approach similar to that used to train neural networks. For the task of causal inference, it has the advantage of being able to learn in the presence of hidden variables. We compare the performance of this approach with the adaptive probabilistic networks technique on a real-world classification problem in molecular biology, and show that our approach trains faster and learns networks with higher classification accuracy.
An Inductive Logic Programming Method for Corpus-based Parser Construction
John M. Zelle and Raymond J. Mooney
Submitted to Computational Linguistics

Empirical methods for building natural language systems has become an important area of research in recent years. Most current approaches are based on propositional learning algorithms and have been applied to the problem of acquiring broad-coverage parsers for relatively shallow (syntactic) representations. This paper outlines an alternative empirical approach based on techniques from a subfield of machine learning known as Inductive Logic Programming (ILP). ILP algorithms, which learn relational (first-order) rules, are used in a parser acquisition system called CHILL that learns rules to control the behavior of a traditional shift-reduce parser. Using this approach, CHILL is able to learn parsers for a variety of different types of analyses, from traditional syntax trees to more meaning-oriented case-role and database query forms. Experimental evidence shows that CHILL performs comparably to propositional learning systems on similar tasks, and is able to go beyond the broad-but-shallow paradigm and learn mappings directly from sentences into useful semantic representations. In a complete database-query application, parsers learned by CHILL outperform an existing hand-crafted system, demonstrating the promise of empricial techniques for automating the construction certain NLP systems.
Relational Learning Techniques for Natural Language Information Extraction
Mary Elaine Califf
Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1997.

The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This paper presents a novel rule representation specific to natural language and a learning system, RAPIER, which learns information extraction rules. RAPIER takes pairs of documents and filled templates indicating the information to be extracted and learns patterns to extract fillers for the slots in the template. This proposal presents initial results on a small corpus of computer-related job postings with a preliminary version of RAPIER. Future research will involve several enhancements to RAPIER as well as more thorough testing on several domains and extension to additional natural language processing tasks. We intend to extend the rule representation and algorithm to allow for more types of constraints than are currently supported. We also plan to incorporate active learning, or sample selection, methods, specifically query by committee, into RAPIER. These methods have the potential to substantially reduce the amount of annotation required. We will explore the issue of distinguishing relevant and irrelevant messages, since currently RAPIER only extracts from the any messages given to it, assuming that all are relevant. We also intend to run much larger tests with RAPIER on multiple domains including the terrorism domain from the third and fourth Message Uncderstanding Conferences, which will allow comparison against other systems. Finally, we plan to demonstrate the generality of RAPIER`s representation and algorithm by applying it to other natural language processing tasks such as word sense disambiguation.
Relational Learning of Pattern-Match Rules for Information Extraction
Mary Elaine Califf and Raymond J. Mooney
To appear in the Working Papers of ACL-97 Workshop on Natural Language Learning

Information extraction systems process natural language documents and locate a specific set of relevant items. Given the recent success of empirical or corpus-based approaches in other areas of natural language processing, machine learning has the potential to significantly aid the development of these knowledge-intensive systems. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc.jobs.offered.
Applying ILP-based Techniques to Natural Language Information Extraction: An Experiment in Relational Learning
Mary Elaine Califf and Raymond J. Mooney
To appear in the Working Notes of the IJCAI-97 Workshop on Frontiers in Inductive Logic Programming.

Information extraction systems process natural language documents and locate a specific set of relevant items. Given the recent success of empirical or corpus-based approaches in other areas of natural language processing, machine learning has the potential to significantly aid the development of these knowledge-intensive systems. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc.jobs.offered.
Learning to Improve both Efficiency and Quality of Planning
Tara A. Estlin and Raymond J. Mooney
To appear in the Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97)

Most research in learning for planning has concentrated on efficiency gains. Another important goal is improving the quality of final plans. Learning to improve plan quality has been examined by a few researchers, however, little research has been done learning to improve both efficiency and quality. This paper explores this problem by using the SCOPE learning system to acquire control knowledge that improves on both of these metrics. Since SCOPE uses a very flexible training approach, we can easily focus it's learning algorithm to prefer search paths that are better for particular evaluation metrics. Experimental results show that SCOPE can significantly improve both the quality of final plans and overall planning efficiency.
Learning Parse and Translation Decisions From Examples With Rich Context
Ulf Hermjakob and Raymond J. Mooney
Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL'97/EACL'97), pp. 482-489

This paper presents a knowledge and context-based system for parsing and translating natural language and evaluates it on sentences from the Wall Street Journal. Applying machine learning techniques, the system uses parse action examples acquired under supervision to generate a deterministic shift-reduce parser in the form of a decision structure. It relies heavily on context, as encoded in features which describe the morpholgical, syntactical, semantical and other aspects of a given parse state.
Semantic Lexicon Acquisition for Learning Parsers
Cynthia A. Thompson and Raymond J. Mooney
Submitted for Review

This paper describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that learns a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with representations of their meaning, and allows for both synonymy and polysemy. WOLFIE is part of an integrated system that learns to parse novel sentences into their meaning representations. Experimental results are presented that demonstrate WOLFIE's ability to learn useful lexicons for a realistic domain. The lexicons learned by WOLFIE are also compared to those learned by another lexical acquisition system, that of Siskind (1996).
Inductive Logic Programming for Natural Language Processing
Raymond J. Mooney
Proceedings of the 6th International Inductive Logic Programming Workshop, pp. 205-224, Stockholm, Sweden, August 1996.

This paper reviews our recent work on applying inductive logic programming to the construction of natural language processing systems. We have developed a system, CHILL, that learns a parser from a training corpus of parsed sentences by inducing heuristics that control an initial overly-general shift-reduce parser. CHILL learns syntactic parsers as well as ones that translate English database queries directly into executable logical form. The ATIS corpus of airline information queries was used to test the acquisition of syntactic parsers, and CHILL performed competitively with recent statistical methods. English queries to a small database on U.S. geography were used to test the acquisition of a complete natural language interface, and the parser that CHILL acquired was more accurate than an existing hand-coded system. The paper also includes a discussion of several issues this work has raised regarding the capabilities and testing of ILP systems as well as a summary of our current research directions.
Combining Symbolic and Connectionist Learning Methods to Refine Certainty-Factor Rule-Bases
J. Jeffrey Mahoney
Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, May, 1996.

This research describes the system RAPTURE, which is designed to revise rule bases expressed in certainty-factor format. Recent studies have shown that learning is facilitated when biased with domain-specific expertise, and have also shown that many real-world domains require some form of probabilistic or uncertain reasoning in order to successfully represent target concepts. RAPTURE was designed to take advantage of both of these results.
Beginning with a set of certainty-factor rules, along with accurately-labelled training examples, RAPTURE makes use of both symbolic and connectionist learning techniques for revising the rules, in order that they correctly classify all of the training examples. A modified version of backpropagation is used to adjust the certainty factors of the rules, ID3's information-gain heuristic is used to add new rules, and the Upstart algorithm is used to create new hidden terms in the rule base.
Results on refining four real-world rule bases are presented that demonstrate the effectiveness of this combined approach. Two of these rule bases were designed to identify particular areas in strands of DNA, one is for identifying infectious diseases, and the fourth attempts to diagnose soybean diseases. The results of RAPTURE are compared with those of backpropagation, C4.5, KBANN, and other learning systems. RAPTURE generally produces sets of rules that are more accurate that these other systems, often creating smaller sets of rules and using less training time.
Integrating EBL and ILP to Acquire Control Rules for Planning
Tara A. Estlin and Raymond J. Mooney
Proceedings of the Third International Workshop on Multi-Strategy Learning, pp. 271-279, Harpers Ferry, WV, May 1996. (MSL-96).

Most approaches to learning control information in planning systems use explanation-based learning to generate control rules. Unfortunately, EBL alone often produces overly complex rules that actually decrease planning efficiency. This paper presents a novel learning approach for control knowledge acquisition that integrates explanation-based learning with techniques from inductive logic programming. EBL is used to constrain an inductive search for selection heuristics that help a planner choose between competing plan refinements. SCOPE is one of the few systems to address learning control information in the newer partial-order planners. Specifically, SCOPE learns domain-specific control rules for a version of the UCPOP planning algorithm. The resulting system is shown to produce significant speedup in two different planning domains.
Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning
Raymond J. Mooney
Proceedings of the 1996 Conference on Empirical Methods in Natural Language Processing, pp. 82-91, Philadelphia, PA, May 1996.

This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neural-network, decision-tree, rule-based, and case-based classification techniques. The specific problem tested involves disambiguating six senses of the word ``line'' using the words in the current and proceeding sentence as context. The statistical and neural-network methods perform the best on this particular problem and we discuss a potential reason for this observed difference. We also discuss the role of bias in machine learning and its importance in explaining performance differences observed on specific problems.
Learning to Parse Database Queries using Inductive Logic Programming
John M. Zelle and Raymond J. Mooney
Proceedings of the Thirteenth National Conference on Aritificial Intelligence, pp. 1050-1055, Portland, OR, August, 1996. (AAAI-96)

This paper presents recent work using the CHILL parser acquisition system to automate the construction of a natural-language interface for database queries. CHILL treats parser acquisition as the learning of search-control rules within a logic program representing a shift-reduce parser and uses techniques from Inductive Logic Programming to learn relational control knowledge. Starting with a general framework for constructing a suitable logical form, CHILL is able to train on a corpus comprising sentences paired with database queries and induce parsers that map subsequent sentences directly into executable queries. Experimental results with a complete database-query application for U.S. geography show that CHILL is able to learn parsers that outperform a pre-existing, hand-crafted counterpart. These results demonstrate the ability of a corpus-based system to produce more than purely syntactic representations. They also provide direct evidence of the utility of an empirical approach at the level of a complete natural language application.
A Novel Application of Theory Refinement to Student Modeling
Paul Baffes and Raymond J. Mooney
Proceedings of the Thirteenth National Conference on Aritificial Intelligence, pp. 403-408, Portland, OR, August, 1996. (AAAI-96)

Theory refinement systems developed in machine learning automatically modify a knowledge base to render it consistent with a set of classified training examples. We illustrate a novel application of these techniques to the problem of constructing a student model for an intelligent tutoring system (ITS). Our approach is implemented in an ITS authoring system called Assert which uses theory refinement to introduce errors into an initially correct knowledge base so that it models incorrect student behavior. The efficacy of the approach has been demonstrated by evaluating a tutor developed with Assert with 75 students tested on a classification task covering concepts from an introductory course on the C++ programming language. The system produced reasonably accurate models and students who received feedback based on these models performed significantly better on a post test than students who received simple reteaching.
Qualitative Multiple-Fault Diagnosis of Continuous Dynamic Systems Using Behavioral Modes
Siddarth Subramanian and Raymond J. Mooney
Proceedings of the Thirteenth National Conference on Aritificial Intelligence, pp. 965-970, Portland, OR, August, 1996. (AAAI-96)

Most model-based diagnosis systems, such as GDE and Sherlock, have concerned discrete, static systems such as logic circuits and use simple constraint propagation to detect inconsistencies. However, sophisticated systems such as QSIM and QPE have been developed for qualitative modeling and simulation of continuous dynamic systems. We present an integration of these two lines of research as implemented in a system called QDOCS for multiple-fault diagnosis of continuous dynamic systems using QSIM models. The main contributions of the algorithm include a method for propagating dependencies while solving a general constraint satisfaction problem and a method for verifying the consistency of a behavior with a model across time. Through systematic experiments on two realistic engineering systems, we demonstrate that QDOCS demonstrates the best balance of generality, accuracy, and efficiency among competing methods.
Multi-Strategy Learning of Search Control for Partial-Order Planning
Tara A. Estlin and Raymond J. Mooney
Proceedings of the Thirteenth National Conference on Aritificial Intelligence, pp. 843-848, Portland, OR, August, 1996. (AAAI-96)

Most research in planning and learning has involved linear, state-based planners. This paper presents SCOPE, a system for learning search-control rules that improve the performance of a partial-order planner. SCOPE integrates explanation-based and inductive learning techniques to acquire control rules for a partial-order planner. Learned rules are in the form of selection heuristics that help the planner choose between competing plan refinements. Specifically, SCOPE learns domain-specific control rules for a version of the UCPOP planning algorithm. The resulting system is shown to produce significant speedup in two different planning domains.
Integrating Explanation-Based and Inductive Learning Techniques to Acquire Search-Control for Planning
Tara A. Estlin
Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1996. (Technical Report AI96-250)

Planning systems have become an important tool for automating a wide variety of tasks. Control knowledge guides a planner to find solutions quickly and is crucial for efficient planning in most domains. Machine learning techniques enable a planning system to automatically acquire domain-specific search-control knowledge for different applications. Past approaches to learning control information have usually employed explanation-based learning (EBL) to generate control rules. Unfortunately, EBL alone often produces overly complex rules that actually decrease rather than improve overall planning efficiency. This paper presents a novel learning approach for control knowledge acquisition that integrates explanation-based learning with techniques from inductive logic programming. In our learning system SCOPE, EBL is used to constrain an inductive search for control heuristics that help a planner choose between competing plan refinements. SCOPE is one of the few systems to address learning control information for newer, partial-order planners. Specifically, this proposal describes how SCOPE learns domain-specific control rules for the UCPOP planning algorithm. The resulting system is shown to produce significant speedup in two different planning domains, and to be more effective than a pure EBL approach.
Future research will be performed in three main areas. First, SCOPE's learning algorithm will be extended to include additional techniques such as constructive induction and rule utility analysis. Second, SCOPE will be more thoroughly tested; several real-world planning domains have been identified as possible testbeds, and more in-depth comparisons will be drawn between SCOPE and other competing approaches. Third, SCOPE will be implemented in a different planning system in order to test its portability to other planning algorithms. This work should demonstrate that machine-learning techniques can be a powerful tool in the quest for tractable real-world planning.
Lexical Acquisition: A Novel Machine Learning Problem
Cynthia A. Thompson and Raymond J. Mooney
Technical Report, Artificial Intelligence Lab, University of Texas at Austin, 1996.

This paper defines a new machine learning problem to which standard machine learning algorithms cannot easily be applied. The problem occurs in the domain of lexical acquisition. The ambiguous and synonymous nature of words causes the difficulty of using standard induction techniques to learn a lexicon. Additionally, negative examples are typically unavailable or difficult to construct in this domain. One approach to solve the lexical acquisition problem is presented, along with preliminary experimental results on an artificial corpus. Future work includes extending the algorithm and performing tests on a more realistic corpus.
Advantages of Decision Lists and Implicit Negative in Inductive Logic Programming
Mary Elaine Califf and Raymond J. Mooney
Technical Report, Artificial Intelligence Lab, University of Texas at Austin, 1996.

This paper demonstrates the capabilities of FOIDL, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completeness assumption to provide implicit negative examples, and the use of intensional background knowledge. The development of FOIDL was originally motivated by the problem of learning to generate the past tense of English verbs; however, this paper demonstrates its superior performance on two different sets of benchmark ILP problems. Tests on the finite element mesh design problem show that FOIDL's decision lists enable it to produce better results than all other ILP systems whose results on this problem have been reported. Tests with a selection of list-processing problems from Bratko's introductory Prolog text demonstrate that the combination of implicit negatives and intensionality allow FOIDL to learn correct programs from far fewer examples than FOIL.
Corpus-Based Lexical Acquisition For Semantic Parsing
Cynthia Thompson
Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1995.

Building accurate and efficient natural language processing (NLP) systems is an important and difficult problem. There has been increasing interest in automating this process. The lexicon, or the mapping from words to meanings, is one component that is typically difficult to update and that changes from one domain to the next. Therefore, automating the acquisition of the lexicon is an important task in automating the acquisition of NLP systems. This proposal describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that learns a lexicon from input consisting of sentences paired with representations of their meanings. Preliminary experimental results show that this system can learn correct and useful mappings. The correctness is evaluated by comparing a known lexicon to one learned from the training input. The usefulness is evaluated by examining the effect of using the lexicon learned by WOLFIE to assist a parser acquisition system, where previously this lexicon had to be hand-built. Future work in the form of extensions to the algorithm, further evaluation, and possible applications is discussed.
Refinement of Bayesian Networks by Combining Connectionist and Symbolic Techniques
Sowmya Ramachandran
Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1995.

Bayesian networks provide a mathematically sound formalism for representing and reasoning with uncertain knowledge and are as such widely used. However, acquiring and capturing knowledge in this framework is difficult. There is a growing interest in formulating techniques for learning Bayesian networks inductively. While the problem of learning a Bayesian network, given complete data, has been explored in some depth, the problem of learning networks with unobserved causes is still open. In this proposal, we view this problem from the perspective of theory revision and present a novel approach which adapts techniques developed for revising theories in symbolic and connectionist representations. Thus, we assume that the learner is given an initial approximate network (usually obtained from a expert). Our technique inductively revises the network to fit the data better. Our proposed system has two components: one component revises the parameters of a Bayesian network of known structure, and the other component revises the structure of the network. The component for parameter revision maps the given Bayesian network into a multi-layer feedforward neural network, with the parameters mapped to weights in the neural network, and uses standard backpropagation techniques to learn the weights. The structure revision component uses qualitative analysis to suggest revisions to the network when it fails to predict the data accurately. The first component has been implemented and we will present results from experiments on real world classification problems which show our technique to be effective. We will also discuss our proposed structure revision algorithm, our plans for experiments to evaluate the system, as well as some extensions to the system.
Refinement-Based Student Modeling and Automated Bug Library Construction
Paul Baffes and Raymond Mooney
Journal of Artificial Intelligence in Education, 7, 1 (1996), pp. 75-116.

A critical component of model-based intelligent tutoring sytems is a mechanism for capturing the conceptual state of the student, which enables the system to tailor its feedback to suit individual strengths and weaknesses. To be useful such a modeling technique must be practical, in the sense that models are easy to construct, and effective, in the sense that using the model actually impacts student learning. This research presents a new student modeling technique which can automatically capture novel student errors using only correct domain knowledge, and can automatically compile trends across multiple student models. This approach has been implemented as a computer program, ASSERT, using a machine learning technique called theory refinement, which is a method for automatically revising a knowledge base to be consistent with a set of examples. Using a knowledge base that correctly defines a domain and examples of a student's behavior in that domain, ASSERT models student errors by collecting any refinements to the correct knowledege base which are necessary to account for the student's behavior. The efficacy of the approach has been demonstrated by evaluating ASSERT using 100 students tested on a classification task covering concepts from an introductory course on the C++ programming language. Students who received feedback based on the models automatically generated by ASSERT performed significantly better on a post test than students who received simple teaching.
Comparative Results on Using Inductive Logic Programming for Corpus-based Parser Construction
John M. Zelle and Raymond J. Mooney
Symbolic, Connectionist, and Statistical Approaches to Learning for Natural Language Processing, S. Wermter, E. Riloff and G. Scheler, Eds, Spring Verlag, 1996.

This paper presents results from recent experiments with CHILL, a corpus-based parser acquisition system. CHILL treats language acquisition as the learning of search-control rules within a logic program. Unlike many current corpus-based approaches that use statistical learning algorithms, CHILL uses techniques from inductive logic programming (ILP) to learn relational representations. CHILL is a very flexible system and has been used to learn parsers that produce syntactic parse trees, case-role analyses, and executable database queries. The reported experiments compare CHILL's performance to that of a more naive application of ILP to parser acquisition. The results show that ILP techniques, as employed in CHILL, are a viable alternative to statistical methods and that the control-rule framework is fundamental to CHILL's success.
Learning the Past Tense of English Verbs Using Inductive Logic Programming
Raymond J. Mooney and Mary Elaine Califf
Symbolic, Connectionist, and Statistical Approaches to Learning for Natural Language Processing, S. Wermter, E. Riloff and G. Scheler, Eds, Spring Verlag, 1996.

This paper presents results on using a new inductive logic programming method called FOIDL to learn the past tense of English verbs. The past tense task has been widely studied in the context of the symbolic/connectionist debate. Previous papers have presented results using various neural-network and decision-tree learning methods. We have developed a technique for learning a special type of Prolog program called a first-order decision list, defined as an ordered list of clauses each ending in a cut. FOIDL is based on FOIL (Quinlan, 1990) but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as the past-tense task. We present results showing that FOIDL learns a more accurate past-tense generator from significantly fewer examples than all other previous methods.
Hybrid Learning of Search Control for Partial-Order Planning
Tara A. Estlin and Raymond J. Mooney
New Directions in AI Planning, M. Ghallab and A. Milani, Eds, IOS Press, 1996, pp. 129-140.

This paper presents results on applying a version of the DOLPHIN search-control learning system to speed up a partial-order planner. DOLPHIN integrates explanation-based and inductive learning techniques to acquire effective clause-selection rules for Prolog programs. A version of the UCPOP partial-order planning algorithm has been implemented as a Prolog program and DOLPHIN used to automatically learn domain-specific search control rules that help eliminate backtracking. The resulting system is shown to produce significant speedup in several planning domains.
Revising Bayesian Network Parameters Using Connectionist Methods
Sowmya Ramachandran and Raymond J. Mooney
Proceedings of the International Conference on Neural Networks (ICNN-96), Special Session on Knowledge-Based Artificial Neural Networks, Washington DC, June 1996.

The problem of learning Bayesian networks with hidden variables is known to be a hard problem. Even the simpler task of learning just the conditional probabilities on a Bayesian network with hidden variables is hard. In this paper, we present an approach that learns the conditional probabilities on a Bayesian network with hidden variables by transforming it into a multi-layer feedforward neural network (ANN). The conditional probabilities are mapped onto weights in the ANN, which are then learned using standard backpropagation techniques. To avoid the problem of exponentially large ANNs, we focus on Bayesian networks with noisy-or and noisy-and nodes. Experiments on real world classification problems demonstrate the effectiveness of our technique.
Qualitative Multiple-Fault Diagnosis of Continuous Dynamic Systems Using Behavioral Modes
Siddarth Subramanian
Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August, 1995.

As systems like chemical plants, power plants, and automobiles get more complex, online diagnostic systems are becomingly increasingly important. One of the ways to rein in the complexity of describing and reasoning about large systems such as these is to describe them using qualitative rather than quantitative models.
Model-based diagnosis is a class of diagnostic techniques that use direct knowledge about how a system functions instead of expert rules detailing causes for every possible set of symptons of a broken system. Our research builds on standard methods for model-based diagnosis and extends them to the domain of complex dynamic systems described using qualitative models.
We motivate and describe out algorithm for diagnosing faults in a dynamic system given a qualitative model and a sequence of qualitative states. The main contributions in this algorithm include a method for propagating dependencies while solving a general constraint satisfaction problem, and a method for verfying the compatibility of a behavior with a model across time. The algorithm can diagnose multiple faults and uses models of faulty behavior, or behavioral modes.
We then demonstrate these techniques using an implemented program called QDOCS and test it on some realistic problems. Through our experiments with a model of the reaction control system (RCS) of the space shuttle and with a level-controller for a reaction tank, we show that QDOCS demonstrates the best balance of generality, accuracy and efficiency among known systems.
Using Inductive Logic Programming to Automate the Construction of Natural Language Parsers
John M. Zelle
Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August, 1995. (Technical Report AI96-249)

Designing computer systems to understand natural language input is a difficult task. In recent years there has been considerable interest in corpus-based methods for constructing natural language parsers. These empirical approaches replace hand-crafted grammars with linguistic models acquired through automated training over language corpora. A common thread among such methods to date is the use of propositional or probablistic representations for the learned knowledge. This dissertation presents an alternative approach based on techniques from a subfield of machine learning known as inductive logic programming (ILP). ILP, which investigates the learning of relational (first-order) rules, provides an empirical method for acquiring knowledge within traditional, symbolic parsing frameworks.
This dissertation details the architecture, implementation and evaluation of CHILL a computer system for acquiring natural language parsers by training over corpora of parsed text. CHILL treats language acquisition as the learning of search-control rules within a logic program that implements a shift-reduce parser. Control rules are induced using a novel ILP algorithm which handles difficult issues arising in the induction of search-control heuristics. Both the control-rule framework and the induction algorithm are crucial to CHILL's success.
The main advantage of CHILL over propositional counterparts is its flexibility in handling varied representations. CHILL has produced parsers for various analyses including case-role mapping, detailed syntactic parse trees, and a logical form suitable for expressing first-order database queries. All of these tasks are accomplished within the same framework, using a single, general learning method that can acquire new syntactic and semantic categories for resolving ambiguities.
Experimental evidence from both aritificial and real-world corpora demonstrate that CHILL learns parsers as well or better than previous artificial neural network or probablistic approaches on comparable tasks. In the database query domain, which goes beyond the scope of previous empirical approaches, the learned parser outperforms an existing hand-crafted system. These results support the claim that ILP techniques as implemented in CHILL represent a viable alternative with significant potential advantages over neural-network, propositional, and probablistic approaches to empirical parser construction.
A Comparison of Two Methods Employing Inductive Logic Programming for Corpus-based Parser Constuction
John M. Zelle and Raymond J. Mooney
Working Notes of the IJCAI-95 Workshop on New Approaches to Learning for Natural Language Processing, pp.79-86, Montreal, Quebec, August, 1995.

This paper presents results from recent experiments with CHILL, a corpus-based parser acquisition system. CHILL treats grammar acquisition as the learning of search-control rules within a logic program. Unlike many current corpus-based approaches that use propositional or probabilistic learning algorithms, CHILL uses techniques from inductive logic programming (ILP) to learn relational representations. The reported experiments compare CHILL's performance to that of a more naive application of ILP to parser acquisition. The results show that ILP techniques, as employed in CHILL, are a viable alternative to propositional methods and that the control-rule framework is fundamental to CHILL's success.
Inducing Logic Programs without Explicit Negative Examples
John M. Zelle, Cynthia A. Thompson, Mary Elaine Califf, and Raymond J. Mooney
Proceedings of the Fifth International Workshop on Inductive Logic Programming, Leuven, Belguim, Sepetember 1995.

This paper presents a method for learning logic programs without explicit negative examples by exploiting an assumption of output completeness. A mode declaration is supplied for the target predicate and each training input is assumed to be accompanied by all of its legal outputs. Any other outputs generated by an incomplete program implicitly represent negative examples; however, large numbers of ground negative examples never need to be generated. This method has been incorporated into two ILP systems, CHILLIN and IFOIL, both of which use intensional background knowledge. Tests on two natural language acquisition tasks, case-role mapping and past-tense learning, illustrate the advantages of the approach.
Acquisition of a Lexicon from Semantic Representations of Sentences
Cynthia A. Thompson
33rd Annual Meeting of the Association of Computational Linguistics, pp. 335-337, Boston, MA July 1995 (ACL-95).

A system, WOLFIE, that acquires a mapping of words to their semantic representation is presented and a preliminary evaluation is performed. Tree least general generalizations (TLGGs) of the representations of input sentences are performed to assist in determining the representations of individual words in the sentences. The best guess for a meaning of a word is the TLGG which overlaps with the highest percentage of sentence representations in which that word appears. Some promising experimental results on a non-artificial data set are presented.
Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs
Raymond J. Mooney and Mary Elaine Califf
Journal of Artificial Intelligence Research, 3 (1995) pp. 1-24.
This paper presents a method for inducing logic programs from examples that learns a new class of concepts called first-order decision lists, defined as ordered lists of clauses each ending in a cut. The method, called FOIDL, is based on FOIL but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as learning the past-tense of English verbs, a task widely studied in the context of the symbolic/connectionist debate. FOIDL is able to learn concise, accurate programs for this problem from significantly fewer examples than previous methods (both connectionist and symbolic).
Multiple-Fault Diagnosis Using General Qualitative Models with Fault Modes
Siddarth Subramanian and Raymond J. Mooney
Working Notes of the IJCAI-95 Workshop on Engneering Problems for Qualitative Reasoning, Monreal, Quebec, August 1995.

This paper describes an approach to diagnosis of systems described by qualitative differential equations represented as QSIM models. An implemented system QDOCS is described that performs multiple-fault, fault-model based diagnosis, using constraint satisfaction techniques, of qualitative behaviors of systems described by such models. We demonstrate the utility of this system by accurately diagnosing randomly generated faults using simulated behaviors of a portion of the Reaction Control System of the space shuttle.
Learning Qualitative Models for Systems with Multiple Operating Regions
Sowmya Ramachandran, Raymond J. Mooney and Benjamin J. Kuipers
Proceedings of the Eight International Workshop of Qualitative Reasoning about Physical Systems, pp. 212-223, Nara, Japan, June 1994. (QR-94)
The problem of learning qualitative models of physical systems from observations of its behaviour has been addressed by several researchers in recent years. Most current techniques limit themselves to learning a single qualitative differential equation to model the entire system. However, many systems have several qualitative differential equations underlying them. In this paper, we present an approach to learning the models for such systems. Our technique divides the behaviours into segments, each of which can be explained by a single qualitative differential equation. The qualitative model for each segment can be generated using any of the existing techniques for learning a single model. We show that results of applying our technique to several examples and demonstrate that it is effective.
Multiple-Fault Diagnosis Using General Qualitative Models with Fault Modes
Siddarth Subramanian and Raymond J. Mooney
Working Papers of the Fifth International Workshop on Principles of Diagnosis, pp. 321-325, New Paltz, NY, 1994.

This paper describes an approach to diagnosis of systems described by qualitative differential equations represented as QSIM models. An implemented system QDOCS is described that performs multiple-fault, fault-model based diagnosis, using constraint satisfaction techniques, of qualitative behaviors of systems described by such models. We demonstrate the utility of this system by accurately diagnosing randomly generated faults using simulated behaviors of a portion of the Reaction Control System of the space shuttle.
Automatic Student Modeling and Bug Library Construction using Theory Refinement
Paul T. Baffes
Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, December, 1994.

The history of computers in education can be characterized by a continuing effort to construct intelligent tutorial programs which can adapt to the individual needs of a student in a one-on-one setting. A critical component of these intelligent tutorials is a mechanism for modeling the conceptual state of the student so that the system is able to tailor its feedback to suit individual strengths and weaknesses. The primary contribution of this research is a new student modeling technique which can automatically capture novel student errors using only correct domain knowledge, and can automatically compile trends across multiple student models into bug libraries. This approach has been implemented as a computer program, ASSERT, using a machine learning technique called theory refinement which is a method for automatically revising a knowledge base to be consistent with a set of examples. Using a knowledge base that correctly defines a domain and examples of a student's behavior in that domain, ASSERT models student errors by collecting any refinements to the correct knowledge base which are necessary to account for the student's behavior. The efficacy of the approach has been demonstrated by evaluating ASSERT using 100 students tested on a classification task using concepts from an introductory course on the C++ programming language. Students who received feedback based on the models automatically generated by ASSERT performed significantly better on a post test than students who received simple reteaching.
Inductive Learning For Abductive Diagnosis
Cynthia A. Thompson and Raymond J. Mooney
Proceedings of the Twelfth National Conference on AI, pp. 664-669, Seattle, WA, July 1994. (AAAI-94)

A new inductive learning system, LAB (Learning for ABduction), is presented which acquires abductive rules from a set of training examples. The goal is to find a small knowledge base which, when used abductively, diagnoses the training examples correctly and generalizes well to unseen examples. This contrasts with past systems that inductively learn rules that are used deductively. Each training example is associated with potentially multiple categories (disorders), instead of one as with typical learning systems. LAB uses a simple hill-climbing algorithm to efficiently build a rule base for a set-covering abductive system. LAB has been experimentally evaluated and compared to other learning systems and an expert knowledge base in the domain of diagnosing brain damage due to stroke.
Comparing Methods For Refining Certainty Factor Rule-Bases
J. Jeffrey Mahoney and Raymond J. Mooney
Proceedings of the Eleventh International Workshop on Machine Learning, pp. 173-180, Rutgers, NJ, July 1994. (ML-94)

This paper compares two methods for refining uncertain knowledge bases using propositional certainty-factor rules. The first method, implemented in the RAPTURE system, employs neural-network training to refine the certainties of existing rules but uses a symbolic technique to add new rules. The second method, based on the one used in the KBANN system, initially adds a complete set of potential new rules with very low certainty and allows neural-network training to filter and adjust these rules. Experimental results indicate that the former method results in significantly faster training and produces much simpler refined rule bases with slightly greater accuracy.
Modifying Network Architectures For Certainty-Factor Rule-Base Revision
J. Jeffrey Mahoney and Raymond J. Mooney
Proceedings of the International Symposium on Integrating Knowledge and Neural Heuristics 1994, pp. 75-84, Pensacola, FL, May 1994. (ISIKNH-94)

This paper describes RAPTURE --- a system for revising probabilistic rule bases that converts symbolic rules into a connectionist network, which is then trained via connectionist techniques. It uses a modified version of backpropagation to refine the certainty factors of the rule base, and uses ID3's information-gain heuristic (Quinlan) to add new rules. Work is currently under way for finding improved techniques for modifying network architectures that include adding hidden units using the UPSTART algorithm (Frean). A case is made via comparison with fully connected connectionist techniques for keeping the rule base as close to the original as possible, adding new input units only as needed.
Combining Top-Down And Bottom-Up Techniques In Inductive Logic Programming
John M. Zelle, Raymond J. Mooney and Joshua B. Konvisser
Proceedings of the Eleventh International Workshop on Machine Learning, pp. 343-351, Rutgers, NJ, July 1994. (ML-94)

This paper describes a new method for inducing logic programs from examples which attempts to integrate the best aspects of existing ILP methods into a single coherent framework. In particular, it combines a bottom-up method similar to GOLEM with a top-down method similar to FOIL. It also includes a method for predicate invention similar to CHAMP and an elegant solution to the ``noisy oracle'' problem which allows the system to learn recursive programs without requiring a complete set of positive examples. Systematic experimental comparisons to both GOLEM and FOIL on a range of problems are used to clearly demonstrate the advantages of the approach.
Inducing Deterministic Prolog Parsers From Treebanks: A Machine Learning Approach
John M. Zelle and Raymond J. Mooney
Proceedings of the Twelfth National Conference on AI, pp. 748-753, Seattle, WA, July 1994. (AAAI-94)

This paper presents a method for constructing deterministic, context-sensitive, Prolog parsers from corpora of parsed sentences. Our approach uses recent machine learning methods for inducing Prolog rules from examples (inductive logic programming). We discuss several advantages of this method compared to recent statistical methods and present results on learning complete parsers from portions of the ATIS corpus.
Integrating ILP and EBL
Raymond J. Mooney and John M. Zelle
SIGART Bulletin, Volume 5, number 1, Jan. 1994, pp 12-21.

This paper presents a review of recent work that integrates methods from Inductive Logic Programming (ILP) and Explanation-Based Learning (EBL). ILP and EBL methods have complementary strengths and weaknesses and a number of recent projects have effectively combined them into systems with better performance than either of the individual approaches. In particular, integrated systems have been developed for guiding induction with prior knowledge (ML-SMART, FOCL, GRENDEL) refining imperfect domain theories (FORTE, AUDREY, Rx), and learning effective search-control knowledge (AxA-EBL, DOLPHIN).
Extending Theory Refinement to M-of-N Rules
Paul T. Baffes and Raymond J. Mooney
Informatica, 17 (1993), pp. 387-397.

In recent years, machine learning research has started addressing a problem known as theory refinement. The goal of a theory refinement learner is to modify an incomplete or incorrect rule base, representing a domain theory, to make it consistent with a set of input training examples. This paper presents a major revision of the EITHER propositional theory refinement system. Two issues are discussed. First, we show how run time efficiency can be greatly improved by changing from a exhaustive scheme for computing repairs to an iterative greedy method. Second, we show how to extend EITHER to refine MofN rules. The resulting algorithm, Neither (New EITHER), is more than an order of magnitude faster and produces significantly more accurate results with theories that fit the MofN format. To demonstrate the advantages of NEITHER, we present experimental results from two real-world domains.
Inductive Learning for Abductive Diagnosis
Cynthia Thompson
M.A. Thesis, Department of Computer Sciences, University of Texas at Austin, 1993.

A new system for learning by induction, called LAB, is presented. LAB (Learning for ABduction) learns abductive rules based on a set of training examples. Our goal is to find a small knowledge base which, when used abductively, diagnoses the training examples correctly, in addition to generalizing well to unseen examples. This is in contrast to past systems, which inductively learn rules which are used deductively. Abduction is particularly well suited to diagnosis, in which we are given a set of symptoms (manifestations) and we want our output to be a set of disorders which explain why the manifestations are present. Each training example is associated with potentially multiple categories, instead of one, which is the case with typical learning systems. Building the knowledge base requires a choice between multiple possibilities, and the number of possibilities grows exponentially with the number of training examples. One method of choosing the best knowledge base is described and implemented. The final system is experimentally evaluated, using data from the domain of diagnosing brain damage due to stroke. It is compared to other learning systems and a knowledge base produced by an expert. The results are promising: the rule base learned is simpler than the expert knowledge base and rules learned by one of the other systems, and the accuracy of the learned rule base in predicting which areas are damaged is better than all the other systems as well as the expert knowledge base.
Learning to Model Students: Using Theory Refinement to Detect Misconceptions
Paul T. Baffes
Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1993.

A new student modeling system called ASSERT is described which uses domain independent learning algorithms to model unique student errors and to automatically construct bug libraries. ASSERT consists of two learning phases. The first is an application of theory refinement techniques for constructing student models from a correct theory of the domain being tutored. The second learning cycle automatically constructs the bug library by extracting common refinements from multiple student models which are then used to bias future modeling efforts. Initial experimental data will be presented which suggests that ASSERT is a more effective modeling system than other induction techniques previously explored for student modeling, and that the automatic bug library construction significantly enhances subsequent modeling efforts.
Learning Search-Control Heuristics for Logic Programs: Applications to Speedup Learning and Language Acquisition
John M. Zelle
Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1993.

This paper presents a general framework, learning search-control heuristics for logic programs, which can be used to improve both the efficiency and accuracy of knowledge-based systems expressed as definite-clause logic programs. The approach combines techniques of explanation-based learning and recent advances in inductive logic programming to learn clause-selection heuristics that guide program execution. Two specific applications of this framework are detailed: dynamic optimization of Prolog programs (improving efficiency) and natural language acquisition (improving accuracy). In the area of program optimization, a prototype system, DOLPHIN, is able to transform some intractable specifications into polynomial-time algorithms, and outperforms competing approaches in several benchmark speedup domains. A prototype language acquisition system, CHILL, is also described. It is capable of automatically acquiring semantic grammars, which uniformly incorprate syntactic and semantic constraints to parse sentences into case-role representations. Initial experiments show that this approach is able to construct accurate parsers which generalize well to novel sentences and significantly outperform previous approaches to learning case-role mapping based on connectionist techniques. Planned extensions of the general framework and the specific applications as well as plans for further evaluation are also discussed.
Combining FOIL and EBG to Speed-Up Logic Programs
John M. Zelle and Raymond J. Mooney
Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1106-1111, Chambery, France, 1993. (IJCAI-93)

This paper presents an algorithm that combines traditional EBL techniques and recent developments in inductive logic programming to learn effective clause selection rules for Prolog programs. When these control rules are incorporated into the original program, significant speed-up may be achieved. The algorithm is shown to be an improvement over competing EBL approaches in several domains. Additionally, the algorithm is capable of automatically transforming some intractable algorithms into ones that run in polynomial time.
Symbolic Revision of Theories With M-of-N Rules
Paul T. Baffes and Raymond J. Mooney
Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1135-1140, Chambery, France, 1993. (IJCAI-93)

This paper presents a major revision of the EITHER propositional theory refinement system. Two issues are discussed. First, we show how run time efficiency can be greatly improved by changing from a exhaustive scheme for computing repairs to an iterative greedy method. Second, we show how to extend EITHER to refine M-of-N rules. The resulting algorithm, NEITHER (New EITHER), is more than an order of magnitude faster and produces significantly more accurate results with theories that fit the M-of-N format. To demonstrate the advantages of NEITHER, we present preliminary experimental results comparing it to EITHER and various other systems on refining the DNA promoter domain theory.
Learning Semantic Grammars With Constructive Inductive Logic Programming
John M. Zelle and Raymond J. Mooney
Proceedings of the Eleventh National Conference of the American Association for Artificial Intelligence, pp. 817-822, Washington, D.C. July 1993 (AAAI-93).

Automating the construction of semantic grammars is a difficult and interesting problem for machine learning. This paper shows how the semantic-grammar acquisition problem can be viewed as the learning of search-control heuristics in a logic program. Appropriate control rules are learned using a new first-order induction algorithm that automatically invents useful syntactic and semantic categories. Empirical results show that the learned parsers generalize well to novel sentences and out-perform previous approaches based on connectionist techniques.
Combining Connectionist and Symbolic Learning to Refine Certainty-Factor Rule-Bases
J. Jeffrey Mahoney and Raymond J. Mooney
Connection Science, 5 (1993), pp. 339-364. (Special issue on Architectures for Integrating Neural and Symbolic Processing)

This paper describes Rapture --- a system for revising probabilistic knowledge bases that combines connectionist and symbolic learning methods. Rapture uses a modified version of backpropagation to refine the certainty factors of a Mycin-style rule base and it uses ID3's information gain heuristic to add new rules. Results on refining three actual expert knowledge bases demonstrate that this combined approach generally performs better than previous methods.
Refinement of First-Order Horn-Clause Domain Theories
Bradley L. Richards and Raymond J. Mooney
Machine Learning 19,2 (1995), pp. 95-131.

Knowledge acquisition is a difficult and time-consuming task, and as error-prone as any human activity. The task of automatically improving an existing knowledge base using learning methods is addressed by a new class of systems performing theory refinement. Until recently, such systems were limited to propositional theories. This paper presents a system, FORTE (First-Order Revision of Theories from Examples), for refining first-order Horn-clause theories. Moving to a first-order representation opens many new problem areas, such as logic program debugging and qualitative modelling, that are beyond the reach of propositional systems. FORTE uses a hill-climbing approach to revise theories. It identifies possible errors in the theory and calls on a library of operators to develop possible revisions. The best revision is implemented, and the process repeats until no further revisions are possible. Operators are drawn from a variety of sources, including propositional theory refinement, first-order induction, and inverse resolution. FORTE has been tested in several domains including logic programming and qualitative modelling.
Encouraging Experimental Results on Learning CNF
Raymond J. Mooney
Machine Learning 19,1 (1995) pp. 79-92.

This paper presents results comparing three inductive learning systems using different representations for concepts, namely: CNF formulae, DNF formulae, and decision trees. The CNF learner performs surprisingly well. Results on five natural data sets show that it frequently trains faster and produces more accurate and simpler concepts. The probable explanation for its superior performance is that the other systems are more susceptible to the replication problem.
Belief Revision in the Context of Abductive Explanation
Siddarth Subramanian
Technical Report AI92-179, Artificial Intelligence Lab,
University of Texas at Austin, March 1991.

This proposal presents an approach to explanation that incorporates the paradigms of belief revision and abduction. We present an algorithm that combines these techniques and a system called BRACE that is a preliminary implementation of this algorithm. We show the applicability of the BRACE approach to a wide range of domains including scientific discovery, device diagnosis and plan recognition. Finally, we describe our proposals for a new implementation, new application domains for our system and extensions to this approach.
A First-Order Horn-Clause Abductive System and Its Use in Plan Recognition and Diagnosis
Hwee Tou Ng and Raymond J. Mooney
Submitted for journal publication.

A diverse set of intelligent activities, including natural language understanding and diagnosis, requires the ability to construct explanations for observed phenomena. In this paper, we view explanation as abduction, where an abductive explanation is a consistent set of assumptions which, together with background knowledge, logically entails a set of observations. We have successfully built a domain-independent system, ACCEL, in which knowledge about a variety of domains is uniformly encoded in first-order Horn-clause axioms. A general-purpose abduction algorithm, AAA, efficiently constructs explanations in the various domains by caching partial explanations to avoid redundant work. Empirical results show that caching of partial explanations can achieve more than an order of magnitude speedup in run time. We have applied our abductive system to two general tasks: plan recognition in text understanding, and diagnosis of medical diseases, logic circuits, and dynamic systems. The results indicate that ACCEL is a general-purpose system capable of plan recognition and diagnosis, yet efficient enough to be of practical utility.
Abductive Plan Recognition and Diagnosis: A Comprehensive Empirical Evaluation
Hwee Tou Ng and Raymond J. Mooney
Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning, pp. 499-508, Cambridge, MA, October 1992.

While it has been realized for quite some time within AI that abduction is a general model of explanation for a variety of tasks, there have been no empirical investigations into the practical feasibility of a general, logic-based abductive approach to explanation. In this paper we present extensive empirical results on applying a general abductive system, ACCEL, to moderately complex problems in plan recognition and diagnosis. In plan recognition, ACCEL has been tested on 50 short narrative texts, inferring characters' plans from actions described in a text. In medical diagnosis, ACCEL has diagnosed 50 real-world patient cases involving brain damage due to stroke (previously addressed by set-covering methods). ACCEL also uses abduction to accomplish model-based diagnosis of logic circuits (a full adder) and continuous dynamic systems (a temperature controller and the water balance system of the human kidney). The results indicate that general purpose abduction is an effective and efficient mechanism for solving problems in plan recognition and diagnosis.
Automatic Abduction of Qualitative Models
Bradley L. Richards, Ina Kraan, and Benjamin J. Kuipers
Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 723-728, San Jose, CA, July 1992.

We describe a method of automatically abducing qualitative models from descriptions of behaviors. We generate, from either quantitative or qualitative data, models in the form of qualitative differential equations suitable for use by QSIM. Constraints are generated and filtered both by comparison with the input behaviors and by dimensional analysis. If the user provides complete information on the input behaviors and the dimensions of the input variables, the resulting model is unique, maximally constrainted, and guaranteed to reproduce the input behaviors. If the user provides incomplete information, our method will still generate a model which reproduces the input behaviors, but the model may no longer be unique. Incompleteness can take several forms: missing dimensions, values of variables, or entire variables.
Learning Relations by Pathfinding
Bradley L. Richards and Raymond J. Mooney
Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 50-55, San Jose, CA, July 1992.

First-order learning systems (e.g., FOIL, FOCL, FORTE) generally rely on hill-climbing heuristics in order to avoid the combinatorial explosion inherent in learning first-order concepts. However, hill-climbing leaves these systems vulnerable to local maxima and local plateaus. We present a method, called relational pathfinding, which has proven highly effective in escaping local maxima and crossing local plateaus. We present our algorithm and provide learning results in two domains: family relationships and qualitative model building.
Speeding-up Logic Programs by Combining EBG and FOIL
John M. Zelle and Raymond J. Mooney
Proceedings of the 1992 Machine Learning Workshop on Knowledge Compilation and Speedup Learning, Aberdeen Scotland, July 1992.

This paper presents an algorithm that combines traditional EBL techniques and recent developments in inductive logic programming to learn effective clause selection rules for Prolog programs. When these control rules are incorporated into the original program, significant speed-up may be achieved. The algorithm produces not only EBL-like speed up of problem solvers, but is capable of automatically transforming some intractable algorithms into ones that run in polynomial time.
Combining Symbolic and Neural Learning to Revise Probabilistic Theories
J. Jeffrey Mahoney & Raymond J. Mooney
Proceedings of the 1992 Machine Learning Workshop on Integrated Learning in Real Domains, Aberdeen Scotland, July 1992.

This paper describes RAPTURE --- a system for revising probabilistic theories that combines symbolic and neural-network learning methods. RAPTURE uses a modified version of backpropagation to refine the certainty factors of a Mycin-style rule-base and it uses ID3's information gain heuristic to add new rules. Results on two real-world domains demonstrate that this combined approach performs as well or better than previous methods.
Growing Layers of Perceptrons: Introducing the Extentron Algorithm
Paul T. Baffes and John M. Zelle
Proceedings of the 1992 International Joint Conference on Neural Networks, pp. 392-397, Baltimore, Maryland, June 1992

The ideas presented here are based on two observations of perceptrons: (1) when the perceptron learning algorithm cycles among hyperplanes, the hyperplanes may be compared to select one that gives a best SPLIT of the examples, and (2) it is always possible for the perceptron to build a hyperplane that separates at least one example from all the rest. We describe the Extentron, which grows multi-layer networks capable of distinguishing non-linearly-separable data using the simple perceptron rule for linear threshold units. The resulting algorithm is simple, very fast, scales well to large problems, retains the convergence properties of the perceptron, and can be completely specified using only two parameters. Results are presented comparing the Extentron to other neural network paradigms and to symbolic learning systems.
Using Theory Revision to Model Students and Acquire Stereotypical Errors
Paul T. Baffes and Raymond J. Mooney
Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, pp. 617-622, Bloomington, IN, July 1992.

Student modeling has been identified as an important component to the long term development of Intelligent Computer-Aided Instruction (ICAI) systems. Two basic approaches have evolved to model student misconceptions. One uses a static, predefined library of user bugs which contains the misconceptions modeled by the system. The other uses induction to learn student misconceptions from scratch. Here, we present a third approach that uses a machine learning technique called theory revision. Using theory revision allows the system to automatically construct a bug library for use in modeling while retaining the flexibility to address novel errors.
A Preliminary PAC Analysis of Theory Revision
Raymond J. Mooney
Computational Learning Theory and Natural Learning Systems, Vol. 3, T. Petsche, S. Judd, and S. Hanson, Eds., MIT Press, 1995, pp. 43-53.

This paper presents a preliminary analysis of the sample complexity of theory revision within the framework of PAC (Probably Approximately Correct) learnability theory. By formalizing the notion that the initial theory is ``close'' to the correct theory we show that the sample complexity of an optimal propositional Horn-clause theory revision algorithm is $O( ( \ln 1 / \delta + d \ln (s_0 + d + n) ) / \epsilon)$, where $d$ is the {\em syntactic distance} between the initial and correct theories, $s_0$ is the size of initial theory, $n$ is the number of observable features, and $\epsilon$ and $\delta$ are the standard PAC error and probability bounds. The paper also discusses the problems raised by the computational complexity of theory revision.
Automated Debugging of Logic Programs via Theory Revision
Raymond J. Mooney & Bradley L. Richards
Proceedings of the Second International Workshop on Inductive Logic Programming, Tokyo, Japan, June 1992.

This paper presents results on using a theory revision system to automatically debug logic programs. FORTE is a recently developed system for revising function-free Horn-clause theories. Given a theory and a set of training examples, it performs a hill-climbing search in an attempt to minimally modify the theory to correctly classify all of the examples. FORTE makes use of methods from propositional theory revision, Horn-clause induction (FOIL), and inverse resolution. The system has has been successfully used to debug logic programs written by undergraduate students for a programming languages course.
Batch versus Incremental Theory Refinement
Raymond J. Mooney
Proceedings of AAAI Spring Symposium on Knowledge Assimilation, Standford, CA, March, 1992.

Most existing theory refinement systems are not incremental. However, any theory refinement system whose input and output theories are compatible can be used to incrementally assimilate data into an evolving theory. This is done by continually feeding its revised theory back in as its input theory. An incremental batch approach, in which the system assimilates a batch of examples at each step, seems most appropriate for existing theory revision systems. Experimental results with the EITHER theory refinement system demonstrate that this approach frequently increases efficiency without significantly decreasing the accuracy or the simplicity of the resulting theory. However, if the system produces bad initial changes to the theory based on only small amount of data, these bad revisions can ``snowball'' and result in an overall decrease in performance.
A Multistrategy Approach to Theory Refinement
Raymond J. Mooney & Dirk Ourston
Machine Learning: A Multistrategy Approach, Vol. IV, R.S. Michalski & G. Teccuci (eds.), pp.141-164, Morgan Kaufman, San Mateo, CA, 1994.

This chapter describes a multistrategy system that employs independent modules for deductive, abductive, and inductive reasoning to revise an arbitrarily incorrect propositional Horn-clause domain theory to fit a set of preclassified training instances. By combining such diverse methods, EITHER is able to handle a wider range of imperfect theories than other theory revision systems while guaranteeing that the revised theory will be consistent with the training data. EITHER has successfully revised two actual expert theories, one in molecular biology and one in plant pathology. The results confirm the hypothesis that using a multistrategy system to learn from both theory and data gives better results than using either theory or data alone.
Integrating Theory and Data in Category Learning
Raymond J. Mooney
Categorization by Humans and Machines: The Psychology of Learning and Motivation, Vol. 29, G. Nakamura, R. Taraban, & D.L. Medin (Eds.), pp. 189-218, Academic Press, Orlando, FL, 1993.

Recent results in both machine learning and cognitive psychology demonstrate that effective category learning involves an integration of theory and data. First, theories can bias induction, affecting what category definitions are extracted from a set of examples. Second, conflicting data can cause theories to be revised. Third, theories can alter the representation of data through feature formation. This chapter reviews two machine learning systems that attempt to integrate theory and data in one or more of these ways. IOU uses a domain theory to acquire part of a concept definition and to focus induction on the unexplained aspects of the data. EITHER uses data to revise an imperfect theory and uses theory to add abstract features to the data. Recent psychological experiments reveal that these machine learning systems exhibit several important aspects of human category learning. Specifically, IOU has been used to successfully model some recent experimental results on the effect of functional knowledge on category learning.
Theory Refinement Combining Analytical and Empirical Methods
Dirk Ourston and Raymond J. Mooney
Artificial Intelligence, 66 (1994), pp. 311--344.

This article describes a comprehensive approach to automatic theory revision. Given an imperfect theory, the approach combines explanation attempts for incorrectly classified examples in order to identify the failing portions of the theory. For each theory fault, correlated subsets of the examples are used to inductively generate a correction. Because the corrections are focused, they tend to preserve the structure of the original theory. Because the system starts with an approximate domain theory, in general fewer training examples are required to attain a given level of performance (classification accuracy) compared to a purely empirical system. The approach applies to classification systems employing a propositional Horn-clause theory. The system has been tested in a variety of application domains, and results are presented for problems in the domains of molecular biology and plant disease diagnosis.
Induction Over the Unexplained: Using Overly-General Domain Theories to Aid Concept Learning
Raymond J. Mooney
Machine Learning, 10, 1 (1993), pp. 79-110.

This paper describes and evaluates an approach to combining empirical and explanation-based learning called Induction Over the Unexplained (IOU). IOU is intended for learning concepts that can be partially explained by an overly-general domain theory. An eclectic evaluation of the method is presented which includes results from all three major approaches: empirical, theoretical, and psychological. Empirical results shows that IOU is effective at refining overly-general domain theories and that it learns more accurate concepts from fewer examples than a purely empirical approach. The application of theoretical results from PAC learnability theory explains why IOU requires fewer examples. IOU is also shown to be able to model psychological data demonstrating the effect of background knowledge on human learning.
An Efficient First-Order Horn-Clause Abduction System Based on the ATMS
Hwee Tou Ng and Raymond J. Mooney
Proceedings of the Ninth National Conference on Artificial Intelligence, pages 494-499, Anaheim, CA, July 1991.

This paper presents an algorithm for first-order Horn-clause abduction that uses an ATMS to avoid redundant computation. This algorithm is either more efficient or more general than any other previous abduction algorithm. Since computing all minimal abductive explanations is intractable, we also present a heuristic version of the algorithm that uses beam search to compute a subset of the simplest explanations. We present empirical results on a broad range of abduction problems from text understanding, plan recognition, and device diagnosis which demonstrate that our algorithm is at least an order of magnitude faster than an alternative abduction algorithm that does not use an ATMS.
Improving Shared Rules in Multiple Category Domain Theories
Dirk Ourston and Raymond J. Mooney
Proceedings of the Eighth International Machine Learning Workshop, pp. 534-538, Evanston, IL, June 1991.

This paper presents an approach to improving the classification performance of a multiple category theory by correcting intermediate rules which are shared among the categories. Using this technique, the performance of a theory in one category can be improved through training in an entirely different category. Examples of the technique are presented and experimental results are given.
Constructive Induction in Theory Refinement
Raymond J. Mooney and Dirk Ourston
Proceedings of the Eighth International Machine Learning Workshop, pp. 178-182, Evanston, IL. June 1991.

This paper presents constructive induction techniques recently added to the EITHER theory refinement system. These additions allow EITHER to handle arbitrary gaps at the ``top,'' ``middle,'' and/or ``bottom'' of an incomplete domain theory. Intermediate concept utilization employs existing rules in the theory to derive higher-level features for use in induction. Intermediate concept creation employs inverse resolution to introduce new intermediate concepts in order to fill gaps in a theory that span multiple levels. These revisions allow EITHER to make use of imperfect domain theories in the ways typical of previous work in both constructive induction and theory refinement. As a result, EITHER is able to handle a wider range of theory imperfections than does any other existing theory refinement system.
Theory Refinement with Noisy Data
Raymond J. Mooney and Dirk Ourston
Technical Report AI 91-153, Artificial Intelligence Lab, University of Texas at Austin, March 1991.

This paper presents a method for revising an approximate domain theory based on noisy data. The basic idea is to avoid making changes to the theory that account for only a small amount of data. This method is implemented in the EITHER propositional Horn-clause theory revision system. The paper presents empirical results on artificially corrupted data to show that this method successfully prevents over-fitting. In other words, when the data is noisy, performance on novel test data is considerably better than revising the theory to completely fit the data. When the data is not noisy, noise processing causes no significant degradation in performance. Finally, noise processing increases efficiency and decreases the complexity of the resulting theory.
On the Role of Coherence in Abductive Explanation
Hwee Tou Ng and Raymond J. Mooney
Proceedings of the Eighth National Conference on Artificial Intelligence, pages 337-342, Boston, MA, 1990.

Abduction is an important inference process underlying much of human intelligent activities, including text understanding, plan recognition, disease diagnosis, and physical device diagnosis. In this paper, we describe some problems encountered using abduction to understand text, and present some solutions to overcome these problems. The solutions we propose center around the use of a different criterion, called explanatory coherence, as the primary measure to evaluate the quality of an explanation. In addition, explanatory coherence plays an important role in the construction of explanations, both in determining the appropriate level of specificity of a preferred explanation, and in guiding the heuristic search to efficiently compute explanations of sufficiently high quality.

estlin@cs.utexas.edu