Research Summary: Raymond J. Mooney

The overall goal of my research is to help automate the development of intelligent systems by using machine learning. My early research included some of the original work in explanation-based learning, in which prior knowledge is used to learn a concept from a single example. Subsequently, my research has focused on two major threads 1) Theory refinement: using imperfect prior domain knowledge to aid inductive learning, and 2) Applications of machine learning (particularly inductive logic programming) to a variety of larger tasks in AI (particularly natural-language understanding).

Theory Refinement

Joint work with former Ph.D. students Dirk Ourston, Brad Richards, Paul Baffes, Jeff Mahoney, and Sowmya Ramachandran

Most research in machine learning has focused on inducing concepts from data given very little, if any, prior knowledge of the domain. By contrast, theory refinement is the task of using data to revise an initial, imperfect knowledge base (KB). Compared to purely inductive learning, the goals of theory refinement are twofold: 1) To improve the accuracy of concepts learned from limited training data by exploiting declaratively-represented prior knowledge of the domain, 2) To improve the comprehensibility of learned concepts by relating them to existing knowledge. Over the last ten years, we have developed a series of theory refinement systems for revising knowledge expressed in a variety of representation languages and have experimentally demonstrated their ability to efficiently and effectively revise real KB's for a range of important applications.

The systems we have developed are best characterized by the formal language used to represent knowledge. We have developed general-purpose systems for revising KB's represented as: 1) Propositional if-then rules (the Either and Neither systems), 2) Horn-clauses in first-order logic, i.e. Prolog programs (the Forte system), and 3) probabilistic knowledge in the form of certainty-factor rules (the Rapture system) and Bayesian networks (the Banner system). All of these systems employ some form of heuristic search to efficiently explore a space of revisions in order to find a KB that is consistent with a set of supervised training examples (while avoiding over-fitting). We have exploited three key ideas in these search methods: 1) Use abductive reasoning to locate faults in the theory based on errors made on specific examples, 2) Adapt symbolic inductive learning methods to find appropriate structural additions to the knowledge base, 3) Adapt neural-network methods (gradient descent) to revise continuous numerical parameters.

The above systems have been rigorously tested on a variety of real-world knowledge bases. In the area of molecular biology, we have successfully revised knowledge bases for identifying two important patterns in DNA sequences: promoters, which indicate where new genes begin; and splice-junctions, which indicate the boundaries between coding and non-coding (junk) DNA. In diagnosis, we have revised knowledge bases for human bacterial infections and soybean diseases. In intelligent tutoring systems, we have revised correct domain knowledge to model incorrect student behavior. In all of these cases, theory refinement was shown to improve the accuracy of the theory on independent test data, and to produce more accurate results than standard inductive methods such as C4.5 decision-tree or backpropagation neural-net learning.

Diverse AI Applications of Machine Learning

Joint work with former Ph.D. students Paul Baffes and Tara Estlin

Most research in machine learning has focused on feature-vector classification. This task has a wide variety of applications and is relatively easy to evaluate on standardized data sets, such as those at the UCI repository. However, in many ways, the emphasis on this task has isolated machine learning from other important AI problems that are not easily mapped into this framework. I believe this isolation is detrimental to progress in both machine learning and AI as a whole. Consequently, much of my recent research has focused on applying learning to a variety of different AI tasks, and attempting to bridge the gap between the machine learning community and other AI research communities.

My primary focus has been on applications to natural-language processing (NLP), as discussed below. However, we have also worked on applications to intelligent tutoring, planning, uncertain reasoning, and qualitative reasoning. For example, Assert is a tutoring system we developed that employs theory refinement to build a student model which is then used to provide directed pedagogical feedback. In a controlled experiment with 100 students in a class on C++ programming, this personalized feedback was shown to produce larger improvements in students' performance on a post-test than generic feedback. With respect to planning, Scope is a system we developed that uses a combination of explanation-based learning and inductive logic programming to learn search-control heuristics that improve both efficiency and plan-quality for a partial-order planner. Experimental results in logistics and process planning have demonstrated its ability to successfully accomplish these goals. Most recently, I have become interested in applications of learning to recommender systems, particularly book recommending for future digital libraries. Libra is a system I developed for learning personalized reader profiles from user ratings using book information extracted from Amazon.com. Together, these diverse applications have shown how learning can be successfully applied to a variety of non-classification problems, and has motivated the development of important new learning algorithms.

Natural Language Learning

Joint work with former Ph.D. students John Zelle, Ulf Hermjakob, Mary-Elaine Califf, and Cynthia Thompson

Since my thesis research on explanation-based learning for schema acquisition in story understanding, using machine learning to aid the development of NLP systems has been one of my key research interests. Most research in empirical (corpus-based) NLP, which has demonstrated important progress in recent years, has concerned statistical methods. My research has demonstrated that symbolic methods, specifically ones utilizing relational representations, can learn effective NLP systems for tasks not typically studied in statistical NLP.

My recent research in this area has focused on three problems. One is demonstrating that relational symbolic methods can perform very well on learning to generate the past tense of an English verb, a task frequently studied in neural networks and cognitive science as a touchstone problem in language acquisition. To my knowledge, our system still has the best performance on this task compared to any other learning method. Second, we have studied the task of learning parsers and semantic lexicons that map sentences (particularly database queries) directly into semantic logical form. Our system, Chill, has been used to automatically construct successful natural-language front-ends for databases on U.S. geography (GeoQuery), Bay-area restaurants (RestaurantQuery), and Austin computer jobs ( JobQuery). Third, we have explored learning pattern-match rules for information extraction from training texts annotated with the labeled phrases to be extracted. Our system, Rapier, has successfully learned rules for extracting a structured database of jobs from the messages posted to the newsgroup austin.jobs. Together, we have used Chill and Rapier to construct a system ( JobQuery) that learns to ``read'' this newsgroup and answer English questions about the jobs it advertises. Most research in statistical NLP has focused on isolated, ``low-level'' tasks such as part-of-speech tagging and syntactic parsing. In contrast, we believe our research demonstrates that symbolic learning methods are capable of constructing complete NLP systems for useful end-user tasks.

I have also worked to improve communication between the machine learning and computational linguistics research communities, which I believe have much to benefit from increased interaction. In pursuit of this goal, my invited talk at the International Conference on Machine Learning in 1997 (postscript slides) encouraged such interaction, I co-edited special issues of both AI Magazine and Machine Learning journal on natural-language learning, and I co-taught a tutorial on symbolic machine learning for natural language processing at the 1999 Annual Meeting of the Association for Computational Linguistics . I believe that communication between these two important sub-areas of AI is improving, and hope that my efforts have contributed in some small way to this development.

Inductive Logic Programming

Joint work with former Ph.D. students Brad Richards, John Zelle, Mary-Elaine Califf, and Tara Estlin

Many of the applications we have explored do not easily lend themselves to feature-vector representations. This motivated us to explore methods that utilize relational representations, such as inductive logic programming (ILP), which concerns learning Prolog programs from I/O tuples. Since existing ILP methods have significant limitations, we have developed our own ILP methods that addressed problems that arose in these applications. In our work on first-order theory refinement, we developed a method for using paths of relations to efficiently construct Prolog clauses. In our work on semantic parsing, we developed an ILP method, Chillin, that integrates predicate invention and both top-down and bottom-up search. In our work on past-tense learning, we developed a method, Foidl, that learns first-order decision lists (Prolog programs with a restricted use of cuts) from positive examples only. In our work on planning, we developed a method that integrates explanation-based learning and induction. Each of these systems required developing novel ILP algorithms that addressed general limitations of existing methods, and they were experimentally verified to also improve performance on problems other than the specific ones that motivated their development. Therefore, I believe our work on applications has also lead to some important algorithmic advances in relational learning.

Current Directions

My current research is focused on three general topics. 1) Improving our methods for learning parsers for mapping natural language queries to semantic form by integrating ILP and statistical methods. 2) Mining knowledge from text by first using learned information extractors to transform corpora of natural language texts into structured databases. 3) Integrating content-based and collaborative methods for recommending books, including methods for selecting good training examples (sample selection).