Content-based recommender systems suggest documents, items, and services to users based on learning a profile of the user from rated examples containing information about the given items. Text categorization methods are very useful for this task but generally rely on unstructured text. We have developed a book-recommending system that utilizes semi-structured information about items gathered from the web using simple information extraction techniques. Initial experimental results demonstrate that this approach can produce fairly accurate recommendations.
Undergraduate Honor Thesis, Department of Computer Sciences,
University of Texas at Austin, May 1998.
Also appears as AI TR 98-270
With the growth of the World Wide Web, recommender systems have
received an increasing amount of attention. Many recommender systems
in use today are based on collaborative filtering. This project has
focused on LIBRA, a content-based book recommending system. By
utilizing text categorization methods and the information available
for each book, the system determines a user profile which is used as
the basis of recommendations made to the user. Instead of the
bag-of-words approach used in many other statistical text
categorization approaches, LIBRA parses each text sample into a
semi-structured representation. We have used standard Machine
Learning techniques to analyze the performance of several algorithms
on this learning task. In addition, we analyze the utility of several
methods of feature construction and selection (i.e. methods of
choosing the representation of an item that the learning algorithm
actually uses). After analyzing the system we conclude that good
recommendations are produced after a relatively small number of
training examples. We also conclude that the feature selection method
tested does not improve the performance of these algorithms in any
systematic way, though the results indicate other feature selection
methods may prove useful. Feature construction, however, while not
providing a large increase in performance with the particular
construction methods used here, holds promise of providing performance
improvements for the algorithms investigated. This text assumes only
minor familiarity with concepts of artificial intelligence and should
be readable by the upper division computer science undergraduate
familiar with basic concepts of probability theory and set theory.
To appear in the Fifteenth International Conference on Machine learning, 1998.
While there has been a growing interest in the problem of learning Bayesian
networks from data, no technique exists for learning or revising Bayesian
networks with Hidden variables (i.e. variables not represented in the data),
that has been shown to be efficient, effective, and scalable through evaluation
on real data. The few techniques that exist for revising such networks
perform a blind search through a large spaces of revisons, and are therefore
computationally expensive. This paper presents BANNER, a technique for using
data to revise a given bayesian network with noisy-or and noisy-and nodes, to
improve its classification accuracy. The initial network can be derived
directly from a logical theory expresssed as propositional rules. BANNER can
revise networks with hidden variables, and add hidden variables when
necessary. Unlike previous approaches, BANNER employs mechanisms similar to
logical theory refinement techniques for using the data to focus the search
for effective modifications. Experiments on real-world problems in the domain
of molecular biology demonstrate that BANNER can effectively revise fairly
large networks to significantly improve their accuracies.
TR AI98-271, Artificial Intelligence Lab,
University of Texas at Austin, May 1998.
This paper experimentally compares three approaches to program induction:
inductive logic programming (ILP), genetic programming (GP), and
genetic logic programming (GLP) (a variant of GP for inducing Prolog programs).
Each of these methods was used to induce four simple,
recursive, list-manipulation functions. The results indicate that ILP is the
most likely to induce a correct program from small sets of random examples,
while GP is generally less accurate. GLP performs the worst, and is rarely able
to induce a correct program. Interpretations of these results in terms of
differences in search methods and inductive biases are presented.
Mary Elaine Califf and Raymond J. Mooney
To appear in Journal of New Generations Computing, 16, 3, (1998).
This paper demonstrates the capabilities of FOIDL, an inductive logic
programming (ILP) system whose distinguishing characteristics are the ability
to produce first-order decision lists, the use of an output completeness
assumption as a substitute for negative examples, and the use of intensional
background knowledge. The development of FOIDL was originally motivated
by the problem of learning to generate the past tense of English verbs;
however, this paper demonstrates its superior performance on two different sets
of benchmark ILP problems. Tests on the finite element mesh design problem
show that FOIDL's decision lists enable it to produce generally more
accurate results than a range of methods previously applied to this problem.
Tests with a selection of list-processing problems from Bratko's introductory
Prolog text demonstrate that the combination of implicit negatives and
intensionality allow FOIDL to learn correct programs from far fewer
examples than FOIL.
Mary Elaine Califf and Raymond J. Mooney
Proceedings of AAAI Spring Symposium on Applying Machine Learning to
Discourse Processing, Standford, CA, March, 1998.
Information extraction is a form of shallow text processing that locates a
specified set of relevant items in a natural-language document. Systems for
this task require significant domain-specific knowledge and are time-consuming
and difficult to build by hand, making them a good application for machine
learning. This paper presents a system, RAPIER, that takes pairs of
sample documents and filled templates and induces pattern-match rules that
directly extract fillers for the slots in the template. The learning algorithm
incorporates techniques from several inductive logic programming systems and
acquires unbounded patterns that include constraints on the words,
part-of-speech tags, and semantic classes present in the filler and the
surrounding text. We present encouraging experimental results on several
domains, including Internet job postings.
Cynthia A. Thompson and Raymond J. Mooney
TR AI98-273, Artificial Intelligence Lab,
University of Texas at Austin, May 1998.
This paper describes a system, WOLFIE (WOrd Learning From Interpreted
Examples), that acquires a semantic lexicon from a corpus of sentences paired
with representations of their meaning. The lexicon learned consists of words
paired with meaning representations. WOLFIE is part of an integrated system
that learns to parse novel sentences into semantic representations, such as
logical database queries. Experimental results are presented demonstrating
WOLFIE's ability to learn useful lexicons for a database interface in four
different natural languages. The lexicons learned by WOLFIE are compared to
those acquired by a competing system developed by Siskind (1996).
The learning system presented in this dissertation, called SCOPE, is a
unique approach to learning control knowledge for planning. SCOPE
learns domain-specific control rules for a planner that improve
both planning efficiency and plan quality, and it is one of the few
systems that can learn control knowledge for partial-order planning.
SCOPE's architecture integrates explanation-based learning (EBL) with
techniques from inductive logic programming. Specifically, EBL is
used to constrain an inductive search for control heuristics that help
a planner choose between competing plan refinements. Since SCOPE uses
a very flexible training approach, its learning algorithm can be
easily focused to prefer search paths that are better for particular
evaluation metrics. SCOPE is extensively tested on several planning
domains, including a logistics transportation domain and a production
manufacturing domain. In these tests, it is shown to significantly
improve both planning efficiency and quality and is shown to be more
robust than a competing approach.
Ph.D. Thesis,
Department of Computer Sciences, University of Texas at Austin, May 1998.
Also appears as TR AI98-269
Artificial intelligence planning systems have become an important tool
for automating a wide variety of tasks. However, even the most
current planning algorithms suffer from two major problems. First,
they often require infeasible amounts of computation time to solve
problems in most domains. And second, they are not guaranteed to
return the best solution to a planning problem, and in fact can
sometimes return very low-quality solutions. One way to address
these problems is to provide a planning system with domain-specific
control knowledge, which helps guide the planner towards more
promising search paths. Machine learning techniques enable a planning
system to automatically acquire search-control knowledge for
different applications. A considerable amount of planning and learning
research has been devoted to acquiring rules that improve
planning efficiency, also known as speedup learning. Much less
work has been down in learning knowledge to improve the quality
of plans, even though this is an essential feature for many
real-world planning systems. Furthermore, even less research has been
done in acquiring control knowledge to improve both these metrics.
Mary Elaine Califf and Raymond J. Mooney
Proceedings of AAAI Spring Symposium on Applying Machine Learning to
Discourse Processing, Standford, CA, March, 1998.
Information extraction is a form of shallow text processing which locates
a specified set of relevant items in natural language documents. Such
systems can be useful, but require domain-specific knowledge and rules,
and are time-consuming and difficult to build by hand, making infomation
extraction a good testbed for the application of machine learning techniques
to natural language processing. This paper presents a system, RAPIER,
that takes pairs of documents and filled templates and induces pattern-match
rules that directly extract fillers for the slots in the template. The
learning algorithm incorporates techniques from several inductive logic
programming systems and learns unbounded patterns that include constraints on
the words and part-of-speech tags surrounding the filler. Encouraging results
are presented on learning to extract information from computer job postings
from the newsgroup misc.jobs.offered.
We therefore propose a system for parsing and translating natural language
that learns from examples and uses some background knowledge.
We test our parser on lexically limited sentences from the Wall Street Journal
and achieve accuracy rates of 89.8% for labeled precision, 98.4% for part of
speech tagging and 56.3% of test sentences without any crossing brackets.
Machine translations of 32 Wall Street Journal sentences to German have been
evaluated by 10 bilingual volunteers and been graded as 2.4 on a 1.0 (best) to
6.0 (worst) scale for both grammatical correctness and meaning preservation.
The translation quality was only minimally better (2.2) when starting each
translation with the correct parse tree, which indicates that the parser is
quite robust and that its errors have only a moderate impact on final trans-
lation quality. These parsing and translation results already compare well
with other systems and, given the relatively small training set and amount of
overall knowledge used so far, the results suggest that our system Contex can
break previous accuracy ceilings when scaled up further.
Ph.D. Thesis,
Department of Computer Sciences, University of Texas at Austin, May 1997.
175 pages.
Also appears as TR 97-12.
The parsing of unrestricted text, with its enormous lexical and structural
ambiguity, still poses a great challenge in natural language processing.
The difficulties with traditional approaches, which try to master the
complexity of parse grammars with hand-crafted rules, have led to a trend
towards more empirical techniques.
As our parsing model we choose a deterministic shift-reduce type parser that
integrates part-of-speech tagging and syntactic and semantic processing, which
not only makes parsing very efficient, but also assures transparency during the
supervised example acquisition.
Applying machine learning techniques, the system uses parse action examples to
generate a parser in the form of a decision structure, a generalization of
decision trees.
To learn good parsing and translation decisions, our system relies heavily on
context, as encoded in currently 205 features describing the morphological,
syntactical and semantical aspects of a given parse state.
Compared with recent
probabilistic systems that were trained on 40,000 sentences, our system relies
on more background knowledge and a deeper analysis, but radically fewer
examples, currently 256 sentences.
To appear in the Working Notes of the IJCAI-97 Workshop on Abduction and Induction in AI
This paper discusses the integration of traditional abductive and inductive
reasoning methods in the development of machine learning systems. In
particular, the paper discusses our recent work in two areas: 1) The use of
traditional abductive methods to propose revisions during theory refinement,
where an existing knowledge base is modified to make it consistent with a set
of empirical data; and 2) The use of inductive learning methods to
automatically acquire from examples a diagnostic knowledge base
used for abductive reasoning.
To appear in the Proceedings of the ML-97 Workshop on Automata
Induction, Grammatical Inference, and Language Acquisition.
For most natural language processing tasks, a parser that maps
sentences into a semantic representation is significantly more useful
than a grammar or automata that simply recognizes syntactically
well-formed strings. This paper reviews our work on using inductive
logic programming methods to learn deterministic shift-reduce parsers
that translate natural language into a semantic representation. We
focus on the task of mapping database queries directly into executable
logical form. An overview of the system is presented followed by
recent experimental results on corpora of Spanish geography queries
and English job-search queries.
Submitted for Review
Learning Bayesian networks inductively in the presence of hidden
variables is still an open problem. Even the simpler task of learning
just the conditional probabilities on a Bayesian network with hidden
variables is not completely solved. In this paper, we present an
approach that learns the parameters of a Bayesian network composed of
noisy-or and noisy-and nodes by using a gradient descent
back-propagation approach similar to that used to train neural
networks. For the task of causal inference, it has the advantage of
being able to learn in the presence of hidden variables. We compare
the performance of this approach with the adaptive probabilistic
networks technique on a real-world classification problem in molecular
biology, and show that our approach trains faster and learns networks
with higher classification accuracy.
Submitted to Computational Linguistics
Empirical methods for building natural language systems has become
an important area of research in recent years. Most current
approaches are based on propositional learning algorithms and have
been applied to the problem of acquiring broad-coverage parsers for
relatively shallow (syntactic) representations. This paper outlines
an alternative empirical approach based on techniques from a
subfield of machine learning known as Inductive Logic Programming
(ILP). ILP algorithms, which learn relational (first-order) rules,
are used in a parser acquisition system called CHILL that
learns rules to control the behavior of a traditional shift-reduce
parser. Using this approach, CHILL is able to learn parsers
for a variety of different types of analyses, from traditional
syntax trees to more meaning-oriented case-role and database query
forms. Experimental evidence shows that CHILL performs
comparably to propositional learning systems on similar tasks, and
is able to go beyond the broad-but-shallow paradigm and learn
mappings directly from sentences into useful semantic
representations. In a complete database-query application, parsers
learned by CHILL outperform an existing hand-crafted system,
demonstrating the promise of empricial techniques for automating the
construction certain NLP systems.
Mary Elaine Califf
Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1997.
The recent growth of online information available in the form of
natural language documents creates a greater need for computing systems
with the ability to process those documents to simplify access to the
information. One type of processing appropriate for many tasks is
information extraction, a type of text skimming that retrieves specific
types of information from text.
Although information extraction systems have existed for two decades,
these systems have generally been built by hand and contain domain specific
information, making them difficult to port to other domains.
A few researchers have begun to apply machine learning to
information extraction tasks, but most of this work has involved
applying learning to pieces of a much larger
system. This paper presents a novel rule representation specific to
natural language and a learning system, RAPIER, which learns
information extraction rules.
RAPIER takes pairs of documents and filled templates indicating the
information to be extracted and learns
patterns to extract fillers for the slots in the template.
This proposal presents initial results on a small corpus of
computer-related job postings with a preliminary version of RAPIER.
Future research will involve several enhancements to RAPIER as well
as more thorough testing on several domains and extension to additional
natural language processing tasks. We intend to extend the rule representation
and algorithm to allow for more types of constraints than are currently
supported. We also plan to incorporate active learning, or
sample selection, methods, specifically query by committee, into RAPIER.
These methods have the potential to substantially reduce the amount of
annotation required. We will explore the issue of distinguishing
relevant and irrelevant messages, since currently RAPIER only extracts
from the any messages given to it, assuming that all are relevant. We also
intend to run much larger tests with RAPIER on multiple domains
including the terrorism domain from the third and fourth Message
Uncderstanding Conferences, which will allow comparison against other
systems. Finally, we plan to demonstrate the generality of RAPIER`s
representation and algorithm by applying it to other natural language
processing tasks such as word sense disambiguation.
Mary Elaine Califf and Raymond J. Mooney
To appear in the Working Papers of ACL-97 Workshop on Natural
Language Learning
Information extraction systems process natural language documents and
locate a specific set of relevant items. Given the recent success of
empirical or corpus-based approaches in other areas of natural
language processing, machine learning has the potential to
significantly aid the development of these knowledge-intensive
systems. This paper presents a system, RAPIER, that takes pairs of
documents and filled templates and induces pattern-match rules that
directly extract fillers for the slots in the template. The learning
algorithm incorporates techniques from several inductive logic
programming systems and learns unbounded patterns that include
constraints on the words and part-of-speech tags surrounding the
filler. Encouraging results are presented on learning to extract
information from computer job postings from the newsgroup
misc.jobs.offered.
Mary Elaine Califf and Raymond J. Mooney
To appear in the Working Notes of the IJCAI-97 Workshop on Frontiers in Inductive Logic Programming.
Information extraction systems process natural language documents and locate a
specific set of relevant items. Given the recent success of empirical or
corpus-based approaches in other areas of natural language processing,
machine learning has the potential to significantly aid the development of
these knowledge-intensive systems. This paper presents a system, RAPIER,
that takes pairs of documents and filled templates and induces pattern-match
rules that directly extract fillers for the slots in the template. The
learning algorithm incorporates techniques from several inductive logic
programming systems and learns unbounded patterns that include constraints on
the words and part-of-speech tags surrounding the filler. Encouraging results
are presented on learning to extract information from computer job postings
from the newsgroup misc.jobs.offered.
Tara A. Estlin and Raymond J. Mooney
To appear in the Proceedings of the Fifteenth International Joint
Conference on Artificial Intelligence (IJCAI-97)
Most research in learning for planning has concentrated on efficiency
gains. Another important goal is improving the quality of final
plans. Learning to improve plan quality has been examined by a few
researchers, however, little research has been done learning to
improve both efficiency and quality. This paper explores this problem
by using the SCOPE learning system to acquire control knowledge that
improves on both of these metrics. Since SCOPE uses a very flexible
training approach, we can easily focus it's learning algorithm to
prefer search paths that are better for particular evaluation
metrics. Experimental results show that SCOPE can significantly
improve both the quality of final plans and overall planning
efficiency.
Ulf Hermjakob and Raymond J. Mooney
Proceedings of the 35th Annual Meeting of the
Association for Computational Linguistics (ACL'97/EACL'97), pp. 482-489
This paper presents a knowledge and context-based system for parsing
and translating natural language and evaluates it on sentences from
the Wall Street Journal.
Applying machine learning techniques, the system uses parse action examples
acquired under supervision to generate a deterministic shift-reduce parser
in the form of a decision structure. It relies heavily on context,
as encoded in features which describe the morpholgical, syntactical,
semantical and other aspects of a given parse state.
Cynthia A. Thompson and Raymond J. Mooney
Submitted for Review
This paper describes a system, WOLFIE (WOrd Learning
From Interpreted Examples), that learns a semantic lexicon from a
corpus of sentences paired with representations of their meaning. The
lexicon learned consists of words paired with representations of their
meaning, and allows for both synonymy and polysemy. WOLFIE is part
of an integrated system that learns to parse novel sentences into their
meaning representations. Experimental results are presented that
demonstrate WOLFIE's ability to learn useful lexicons for a
realistic domain. The lexicons learned by WOLFIE are also compared
to those learned by another lexical acquisition system, that of
Siskind (1996).
Raymond J. Mooney
Proceedings of the 6th International Inductive Logic Programming Workshop, pp. 205-224, Stockholm, Sweden, August 1996.
This paper reviews our recent work on applying inductive logic
programming to the construction of natural language processing
systems. We have developed a system, CHILL, that learns a parser from
a training corpus of parsed sentences by inducing heuristics that
control an initial overly-general shift-reduce parser. CHILL learns
syntactic parsers as well as ones that translate English database
queries directly into executable logical form. The ATIS corpus of
airline information queries was used to test the acquisition of
syntactic parsers, and CHILL performed competitively with recent
statistical methods. English queries to a small database on
U.S. geography were used to test the acquisition of a complete natural
language interface, and the parser that CHILL acquired was more
accurate than an existing hand-coded system. The paper also includes
a discussion of several issues this work has raised regarding the
capabilities and testing of ILP systems as well as a summary of our
current research directions.
Beginning with a set of certainty-factor rules, along with
accurately-labelled training examples, RAPTURE makes use of both
symbolic and connectionist learning techniques for revising the rules,
in order that they correctly classify all of the training examples. A
modified version of backpropagation is used to adjust the certainty
factors of the rules, ID3's information-gain heuristic is used to add
new rules, and the Upstart algorithm is used to create new hidden
terms in the rule base.
Results on refining four real-world rule bases are presented that
demonstrate the effectiveness of this combined approach. Two of these
rule bases were designed to identify particular areas in strands of
DNA, one is for identifying infectious diseases, and the fourth
attempts to diagnose soybean diseases. The results of RAPTURE are
compared with those of backpropagation, C4.5, KBANN, and other
learning systems. RAPTURE generally produces sets of rules that are
more accurate that these other systems, often creating smaller sets of
rules and using less training time.
Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, May, 1996.
This research describes the system RAPTURE, which is designed to
revise rule bases expressed in certainty-factor format. Recent
studies have shown that learning is facilitated when biased with
domain-specific expertise, and have also shown that many real-world
domains require some form of probabilistic or uncertain reasoning in
order to successfully represent target concepts. RAPTURE was designed
to take advantage of both of these results.
Proceedings of the Third International Workshop on
Multi-Strategy Learning, pp. 271-279, Harpers Ferry, WV, May
1996. (MSL-96).
Most approaches to learning control information in planning systems
use explanation-based learning to generate control rules.
Unfortunately, EBL alone often produces overly complex rules that
actually decrease planning efficiency. This paper presents a novel
learning approach for control knowledge acquisition that integrates
explanation-based learning with techniques from inductive logic
programming. EBL is used to constrain an inductive search for
selection heuristics that help a planner choose between competing plan
refinements. SCOPE is one of the few systems to address learning
control information in the newer partial-order planners.
Specifically, SCOPE learns domain-specific control rules for a version
of the UCPOP planning algorithm. The resulting system is shown to
produce significant speedup in two different planning domains.
Proceedings of the 1996 Conference on Empirical Methods in
Natural Language Processing, pp. 82-91, Philadelphia, PA, May
1996.
This paper describes an experimental comparison of seven different learning
algorithms on the problem of learning to disambiguate the meaning of a word
from context. The algorithms tested include statistical, neural-network,
decision-tree, rule-based, and case-based classification techniques. The
specific problem tested involves disambiguating six senses of the word ``line''
using the words in the current and proceeding sentence as context. The
statistical and neural-network methods perform the best on this particular
problem and we discuss a potential reason for this observed difference. We
also discuss the role of bias in machine learning and its importance in
explaining performance differences observed on specific problems.
Proceedings of the Thirteenth National Conference on Aritificial Intelligence,
pp. 1050-1055, Portland, OR, August, 1996. (AAAI-96)
This paper presents recent work using the CHILL parser acquisition
system to automate the construction of a natural-language interface
for database queries. CHILL treats parser acquisition as the learning
of search-control rules within a logic program representing a
shift-reduce parser and uses techniques from Inductive Logic
Programming to learn relational control knowledge. Starting with a
general framework for constructing a suitable logical form, CHILL is
able to train on a corpus comprising sentences paired with database
queries and induce parsers that map subsequent sentences directly into
executable queries. Experimental results with a complete
database-query application for U.S. geography show that CHILL is able
to learn parsers that outperform a pre-existing, hand-crafted
counterpart. These results demonstrate the ability of a corpus-based
system to produce more than purely syntactic representations. They
also provide direct evidence of the utility of an empirical approach
at the level of a complete natural language application.
Proceedings of the Thirteenth National Conference on Aritificial Intelligence,
pp. 403-408, Portland, OR, August, 1996. (AAAI-96)
Theory refinement systems developed in machine learning automatically
modify a knowledge base to render it consistent with a set of
classified training examples. We illustrate a novel application of
these techniques to the problem of constructing a student model for an
intelligent tutoring system (ITS). Our approach is implemented in an
ITS authoring system called Assert which uses theory refinement to
introduce errors into an initially correct knowledge base so that it
models incorrect student behavior. The efficacy of the approach has
been demonstrated by evaluating a tutor developed with Assert with 75
students tested on a classification task covering concepts from an
introductory course on the C++ programming language. The system
produced reasonably accurate models and students who received feedback
based on these models performed significantly better on a post test
than students who received simple reteaching.
Proceedings of the Thirteenth National Conference on Aritificial Intelligence,
pp. 965-970, Portland, OR, August, 1996. (AAAI-96)
Most model-based diagnosis systems, such as GDE and Sherlock, have
concerned discrete, static systems such as logic circuits and use
simple constraint propagation to detect inconsistencies. However,
sophisticated systems such as QSIM and QPE have been developed for
qualitative modeling and simulation of continuous dynamic systems. We
present an integration of these two lines of research as implemented
in a system called QDOCS for multiple-fault diagnosis of continuous
dynamic systems using QSIM models. The main contributions of the
algorithm include a method for propagating dependencies while solving
a general constraint satisfaction problem and a method for verifying
the consistency of a behavior with a model across time. Through
systematic experiments on two realistic engineering systems, we
demonstrate that QDOCS demonstrates the best balance of generality,
accuracy, and efficiency among competing methods.
Proceedings of the Thirteenth National Conference on Aritificial Intelligence,
pp. 843-848, Portland, OR, August, 1996. (AAAI-96)
Most research in planning and learning has involved linear,
state-based planners. This paper presents SCOPE, a system for learning
search-control rules that improve the performance of a partial-order
planner. SCOPE integrates explanation-based and inductive learning
techniques to acquire control rules for a partial-order planner.
Learned rules are in the form of selection heuristics that help the
planner choose between competing plan refinements. Specifically,
SCOPE learns domain-specific control rules for a version of the UCPOP
planning algorithm. The resulting system is shown to produce
significant speedup in two different planning domains.
Future research will be performed in three main areas. First, SCOPE's
learning algorithm will be extended to include additional techniques
such as constructive induction and rule utility analysis. Second,
SCOPE will be more thoroughly tested; several real-world planning
domains have been identified as possible testbeds, and more in-depth
comparisons will be drawn between SCOPE and other competing
approaches. Third, SCOPE will be implemented in a different planning
system in order to test its portability to other planning algorithms.
This work should demonstrate that machine-learning techniques can be a
powerful tool in the quest for tractable real-world planning.
Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1996. (Technical Report AI96-250)
Planning systems have become an important tool for automating a wide
variety of tasks. Control knowledge guides a planner to find solutions
quickly and is crucial for efficient planning in most domains.
Machine learning techniques enable a planning system to automatically
acquire domain-specific search-control knowledge for different
applications. Past approaches to learning control information have
usually employed explanation-based learning (EBL) to generate control
rules. Unfortunately, EBL alone often produces overly complex rules
that actually decrease rather than improve overall planning
efficiency. This paper presents a novel learning approach for control
knowledge acquisition that integrates explanation-based learning with
techniques from inductive logic programming. In our learning system
SCOPE, EBL is used to constrain an inductive search for control
heuristics that help a planner choose between competing plan
refinements. SCOPE is one of the few systems to address learning
control information for newer, partial-order planners. Specifically,
this proposal describes how SCOPE learns domain-specific control rules
for the UCPOP planning algorithm. The resulting system is shown to
produce significant speedup in two different planning domains, and to
be more effective than a pure EBL approach.
Technical Report, Artificial Intelligence Lab,
University of Texas at Austin, 1996.
This paper defines a new machine learning problem to which standard
machine learning algorithms cannot easily be applied. The problem
occurs in the domain of lexical acquisition. The ambiguous and
synonymous nature of words causes the difficulty of using standard
induction techniques to learn a lexicon. Additionally, negative
examples are typically unavailable or difficult to construct in this
domain. One approach to solve the lexical acquisition problem is
presented, along with preliminary experimental results on an
artificial corpus. Future work includes extending the algorithm and
performing tests on a more realistic corpus.
Technical Report, Artificial Intelligence Lab,
University of Texas at Austin, 1996.
This paper demonstrates the capabilities of FOIDL, an inductive logic
programming (ILP) system whose distinguishing characteristics are the
ability to produce first-order decision lists, the use of an output
completeness assumption to provide implicit negative examples, and the
use of intensional background knowledge. The development of FOIDL was
originally motivated by the problem of learning to generate the past
tense of English verbs; however, this paper demonstrates its superior
performance on two different sets of benchmark ILP problems. Tests on
the finite element mesh design problem show that FOIDL's decision
lists enable it to produce better results than all other ILP systems
whose results on this problem have been reported. Tests with a selection
of list-processing problems from Bratko's introductory Prolog text demonstrate that the combination of implicit negatives and intensionality allow
FOIDL to learn correct programs from far fewer examples than FOIL.
Ph.D. proposal, Department of Computer Sciences, University of Texas
at Austin, 1995.
Building accurate and efficient natural language processing (NLP)
systems is an important and difficult problem. There has been
increasing interest in automating this process. The lexicon, or the
mapping from words to meanings, is one component that is typically
difficult to update and that changes from one domain to the next.
Therefore, automating the acquisition of the lexicon is an important
task in automating the acquisition of NLP systems. This proposal
describes a system, WOLFIE (WOrd Learning From Interpreted Examples),
that learns a lexicon from input consisting of sentences paired with
representations of their meanings. Preliminary experimental results
show that this system can learn correct and useful mappings. The
correctness is evaluated by comparing a known lexicon to one learned
from the training input. The usefulness is evaluated by examining the
effect of using the lexicon learned by WOLFIE to assist a parser
acquisition system, where previously this lexicon had to be
hand-built. Future work in the form of extensions to the algorithm,
further evaluation, and possible applications is discussed.
Ph.D. proposal, Department of Computer Sciences, University of Texas
at Austin, 1995.
Bayesian networks provide a mathematically sound formalism for
representing and reasoning with uncertain knowledge and are as such
widely used. However, acquiring and capturing knowledge in this
framework is difficult. There is a growing interest in formulating
techniques for learning Bayesian networks inductively. While the
problem of learning a Bayesian network, given complete data, has been
explored in some depth, the problem of learning networks with
unobserved causes is still open. In this proposal, we view this
problem from the perspective of theory revision and present a novel
approach which adapts techniques developed for revising theories in
symbolic and connectionist representations. Thus, we assume that the
learner is given an initial approximate network (usually obtained from
a expert). Our technique inductively revises the network to fit the
data better. Our proposed system has two components: one component
revises the parameters of a Bayesian network of known structure, and
the other component revises the structure of the network. The
component for parameter revision maps the given Bayesian network into
a multi-layer feedforward neural network, with the parameters mapped
to weights in the neural network, and uses standard backpropagation
techniques to learn the weights. The structure revision component uses
qualitative analysis to suggest revisions to the network when it fails
to predict the data accurately. The first component has been
implemented and we will present results from experiments on real world
classification problems which show our technique to be effective. We
will also discuss our proposed structure revision algorithm, our plans
for experiments to evaluate the system, as well as some extensions to
the system.
Journal of Artificial Intelligence in Education, 7, 1
(1996), pp. 75-116.
A critical component of model-based intelligent tutoring sytems is a
mechanism for capturing the conceptual state of the student, which
enables the system to tailor its feedback to suit individual strengths
and weaknesses. To be useful such a modeling technique must be
practical, in the sense that models are easy to construct, and
effective, in the sense that using the model actually impacts student
learning. This research presents a new student modeling technique
which can automatically capture novel student errors using only
correct domain knowledge, and can automatically compile trends across
multiple student models. This approach has been implemented as a
computer program, ASSERT, using a machine learning technique called
theory refinement, which is a method for automatically revising a
knowledge base to be consistent with a set of examples. Using a
knowledge base that correctly defines a domain and examples of a
student's behavior in that domain, ASSERT models student errors by
collecting any refinements to the correct knowledege base which are
necessary to account for the student's behavior. The efficacy of the
approach has been demonstrated by evaluating ASSERT using 100 students
tested on a classification task covering concepts from an introductory
course on the C++ programming language. Students who received
feedback based on the models automatically generated by ASSERT
performed significantly better on a post test than students who
received simple teaching.
Symbolic, Connectionist, and Statistical Approaches to Learning for Natural
Language Processing, S. Wermter, E. Riloff and G. Scheler, Eds,
Spring Verlag, 1996.
This paper presents results from recent experiments with CHILL, a
corpus-based parser acquisition system. CHILL treats language
acquisition as the learning of search-control rules within a logic
program. Unlike many current corpus-based approaches that use
statistical learning algorithms, CHILL uses techniques from inductive
logic programming (ILP) to learn relational representations. CHILL is
a very flexible system and has been used to learn parsers that produce
syntactic parse trees, case-role analyses, and executable database
queries. The reported experiments compare CHILL's performance to that
of a more naive application of ILP to parser acquisition. The results
show that ILP techniques, as employed in CHILL, are a viable
alternative to statistical methods and that the control-rule framework
is fundamental to CHILL's success.
Symbolic, Connectionist, and Statistical Approaches to Learning for Natural
Language Processing, S. Wermter, E. Riloff and G. Scheler, Eds,
Spring Verlag, 1996.
This paper presents results on using a new inductive logic programming
method called FOIDL to learn the past tense of English verbs. The past
tense task has been widely studied in the context of the
symbolic/connectionist debate. Previous papers have presented results
using various neural-network and decision-tree learning methods. We
have developed a technique for learning a special type of Prolog
program called a first-order decision list, defined as an
ordered list of clauses each ending in a cut. FOIDL is based on FOIL
(Quinlan, 1990) but employs intensional background knowledge and
avoids the need for explicit negative examples. It is particularly
useful for problems that involve rules with specific exceptions, such
as the past-tense task. We present results showing that FOIDL learns
a more accurate past-tense generator from significantly fewer examples
than all other previous methods.
New Directions in AI Planning, M. Ghallab and A. Milani, Eds,
IOS Press, 1996, pp. 129-140.
This paper presents results on applying a version of the DOLPHIN
search-control learning system to speed up a partial-order planner.
DOLPHIN integrates explanation-based and inductive learning techniques
to acquire effective clause-selection rules for Prolog programs. A
version of the UCPOP partial-order planning algorithm has been
implemented as a Prolog program and DOLPHIN used to automatically
learn domain-specific search control rules that help eliminate
backtracking. The resulting system is shown to produce significant
speedup in several planning domains.
Proceedings of the International Conference on Neural
Networks (ICNN-96), Special Session on Knowledge-Based Artificial
Neural Networks, Washington DC, June 1996.
The problem of learning Bayesian networks with hidden variables is known to
be a hard problem. Even the simpler task of learning just the conditional
probabilities on a Bayesian network with hidden variables is hard. In this
paper, we present an approach that learns the conditional probabilities on
a Bayesian network with hidden variables by transforming it into a
multi-layer feedforward neural network (ANN). The conditional probabilities
are mapped onto weights in the ANN, which are then learned using standard
backpropagation techniques. To avoid the problem of exponentially large
ANNs, we focus on Bayesian networks with noisy-or and noisy-and
nodes. Experiments on real world classification problems demonstrate the
effectiveness of our technique.
Model-based diagnosis is a class of diagnostic techniques that use
direct knowledge about how a system functions instead of expert rules
detailing causes for every possible set of symptons of a broken
system. Our research builds on standard methods for model-based
diagnosis and extends them to the domain of complex dynamic systems
described using qualitative models.
We motivate and describe out algorithm for diagnosing faults in a
dynamic system given a qualitative model and a sequence of qualitative
states. The main contributions in this algorithm include a method for
propagating dependencies while solving a general constraint
satisfaction problem, and a method for verfying the compatibility of a
behavior with a model across time. The algorithm can diagnose
multiple faults and uses models of faulty behavior, or behavioral
modes.
We then demonstrate these techniques using an implemented program
called QDOCS and test it on some realistic problems. Through our
experiments with a model of the reaction control system (RCS) of the
space shuttle and with a level-controller for a reaction tank, we show
that QDOCS demonstrates the best balance of generality, accuracy and
efficiency among known systems.
Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August, 1995.
As systems like chemical plants, power plants, and automobiles get
more complex, online diagnostic systems are becomingly increasingly
important. One of the ways to rein in the complexity of describing
and reasoning about large systems such as these is to describe them
using qualitative rather than quantitative models.
This dissertation details the architecture, implementation and
evaluation of CHILL a computer system for acquiring natural
language parsers by training over corpora of parsed text. CHILL
treats language acquisition as the learning of search-control rules
within a logic program that implements a shift-reduce parser. Control
rules are induced using a novel ILP algorithm which handles difficult
issues arising in the induction of search-control heuristics. Both
the control-rule framework and the induction algorithm are crucial to
CHILL's success.
The main advantage of CHILL over propositional counterparts is
its flexibility in handling varied representations. CHILL has
produced parsers for various analyses including case-role mapping,
detailed syntactic parse trees, and a logical form suitable for
expressing first-order database queries. All of these tasks are
accomplished within the same framework, using a single, general
learning method that can acquire new syntactic and semantic categories
for resolving ambiguities.
Experimental evidence from both aritificial and real-world corpora
demonstrate that CHILL learns parsers as well or better than
previous artificial neural network or probablistic approaches on
comparable tasks. In the database query domain, which goes beyond the
scope of previous empirical approaches, the learned parser outperforms
an existing hand-crafted system. These results support the claim that
ILP techniques as implemented in CHILL represent a viable
alternative with significant potential advantages over neural-network,
propositional, and probablistic approaches to empirical parser
construction.
Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August, 1995. (Technical Report AI96-249)
Designing computer systems to understand natural language input is a
difficult task. In recent years there has been considerable interest
in corpus-based methods for constructing natural language parsers.
These empirical approaches replace hand-crafted grammars with
linguistic models acquired through automated training over language
corpora. A common thread among such methods to date is the use of
propositional or probablistic representations for the learned
knowledge. This dissertation presents an alternative approach based
on techniques from a subfield of machine learning known as inductive
logic programming (ILP). ILP, which investigates the learning of
relational (first-order) rules, provides an empirical method for
acquiring knowledge within traditional, symbolic parsing frameworks.
Working Notes of the IJCAI-95 Workshop on New Approaches to
Learning for Natural Language Processing, pp.79-86, Montreal,
Quebec, August, 1995.
This paper presents results from recent experiments with CHILL,
a corpus-based parser acquisition system. CHILL treats grammar
acquisition as the learning of search-control rules within a logic
program. Unlike many current corpus-based approaches that use
propositional or probabilistic learning algorithms, CHILL uses
techniques from inductive logic programming (ILP) to learn relational
representations. The reported experiments compare CHILL's
performance to that of a more naive application of ILP to parser
acquisition. The results show that ILP techniques, as employed in
CHILL, are a viable alternative to propositional methods and
that the control-rule framework is fundamental to CHILL's
success.
Proceedings of the Fifth International Workshop on Inductive Logic
Programming, Leuven, Belguim, Sepetember 1995. This paper presents a method for learning logic programs
without explicit negative examples by exploiting an assumption of
output completeness. A mode declaration is supplied for the
target predicate and each training input is assumed to be accompanied
by all of its legal outputs. Any other outputs generated by an
incomplete program implicitly represent negative examples; however,
large numbers of ground negative examples never need to be generated.
This method has been incorporated into two ILP systems, CHILLIN and
IFOIL, both of which use intensional background knowledge. Tests on
two natural language acquisition tasks, case-role mapping and
past-tense learning, illustrate the advantages of the approach.
33rd Annual Meeting of the Association of Computational Linguistics, pp. 335-337, Boston, MA July 1995 (ACL-95).
A system, WOLFIE, that acquires a mapping of words to their semantic
representation is presented and a preliminary evaluation is performed.
Tree least general generalizations (TLGGs) of the representations of input
sentences are performed to assist in determining the representations
of individual words in the sentences. The best guess for a meaning
of a word is the TLGG which overlaps with the highest percentage of
sentence representations in which that word appears.
Some promising experimental results on a non-artificial data
set are presented.
Journal of Artificial Intelligence Research, 3 (1995) pp. 1-24.
This paper presents a method for inducing logic programs from examples that
learns a new class of concepts called first-order decision lists, defined
as ordered lists of clauses each ending in a cut. The method, called FOIDL,
is based on FOIL but employs intensional background knowledge and avoids
the need for explicit negative examples. It is particularly useful for
problems that involve rules with specific exceptions, such as learning the
past-tense of English verbs, a task widely studied in the context of the
symbolic/connectionist debate. FOIDL is able to learn concise, accurate
programs for this problem from significantly fewer examples than previous
methods (both connectionist and symbolic).
Working Notes of the IJCAI-95 Workshop on Engneering Problems for Qualitative Reasoning, Monreal, Quebec, August 1995.
This paper describes an approach to diagnosis of systems described by
qualitative differential equations represented as QSIM models. An
implemented system QDOCS is described that performs multiple-fault,
fault-model based diagnosis, using constraint satisfaction techniques,
of qualitative behaviors of systems described by such models. We
demonstrate the utility of this system by accurately diagnosing
randomly generated faults using simulated behaviors of a portion of
the Reaction Control System of the space shuttle.
Proceedings of the Eight International Workshop of Qualitative
Reasoning about Physical Systems, pp. 212-223, Nara, Japan,
June 1994. (QR-94)
The problem of learning qualitative models of physical systems from
observations of its behaviour has been addressed by several
researchers in recent years. Most current techniques limit themselves
to learning a single qualitative differential equation to model the
entire system. However, many systems have several qualitative
differential equations underlying them. In this paper, we present an
approach to learning the models for such systems. Our technique
divides the behaviours into segments, each of which can be explained
by a single qualitative differential equation. The qualitative model
for each segment can be generated using any of the existing techniques
for learning a single model. We show that results of applying our
technique to several examples and demonstrate that it is effective.
Working Papers of the Fifth International Workshop on
Principles of Diagnosis, pp. 321-325, New Paltz, NY, 1994.
This paper describes an approach to diagnosis of systems described by
qualitative differential equations represented as QSIM models. An
implemented system QDOCS is described that performs multiple-fault,
fault-model based diagnosis, using constraint satisfaction techniques,
of qualitative behaviors of systems described by such models. We
demonstrate the utility of this system by accurately diagnosing
randomly generated faults using simulated behaviors of a portion of
the Reaction Control System of the space shuttle.
Proceedings of the Twelfth National Conference on
AI, pp. 664-669, Seattle, WA, July 1994. (AAAI-94) A new inductive learning system, LAB (Learning for
ABduction), is presented which acquires abductive rules from a set of
training examples. The goal is to find a small knowledge base which,
when used abductively, diagnoses the training examples correctly and
generalizes well to unseen examples. This contrasts with past systems
that inductively learn rules that are used deductively. Each training
example is associated with potentially multiple categories
(disorders), instead of one as with typical learning systems. LAB
uses a simple hill-climbing algorithm to efficiently build a rule base
for a set-covering abductive system. LAB has been experimentally
evaluated and compared to other learning systems and an expert
knowledge base in the domain of diagnosing brain damage due to stroke.
J. Jeffrey Mahoney and Raymond J. Mooney
Proceedings of the Eleventh International Workshop
on Machine Learning, pp. 173-180, Rutgers, NJ, July 1994. (ML-94)
This paper compares two methods for refining uncertain knowledge bases using
propositional certainty-factor rules. The first method, implemented in the
RAPTURE system, employs neural-network training to refine the certainties
of existing rules but uses a symbolic technique to add new rules. The second
method, based on the one used in the KBANN system, initially adds a
complete set of potential new rules with very low certainty and allows
neural-network training to filter and adjust these rules. Experimental results
indicate that the former method results in significantly faster training and
produces much simpler refined rule bases with slightly greater accuracy.
J. Jeffrey Mahoney and Raymond J. Mooney
Proceedings of the International Symposium on Integrating
Knowledge and Neural Heuristics 1994, pp. 75-84, Pensacola, FL,
May 1994. (ISIKNH-94)
This paper describes RAPTURE --- a system for revising
probabilistic rule bases that converts symbolic rules into a
connectionist network, which is then trained via connectionist
techniques. It uses a modified version of backpropagation to refine
the certainty factors of the rule base, and uses ID3's
information-gain heuristic (Quinlan) to add new rules. Work is
currently under way for finding improved techniques for modifying
network architectures that include adding hidden units using the
UPSTART algorithm (Frean). A case is made via comparison with fully
connected connectionist techniques for keeping the rule base as close
to the original as possible, adding new input units only as needed.
John M. Zelle, Raymond J. Mooney and Joshua B. Konvisser
Proceedings of the Eleventh International Workshop
on Machine Learning, pp. 343-351, Rutgers, NJ, July 1994. (ML-94)
This paper describes a new method for inducing logic programs from
examples which attempts to integrate the best aspects of existing ILP
methods into a single coherent framework. In particular, it combines
a bottom-up method similar to GOLEM with a top-down method similar to
FOIL. It also includes a method for predicate invention similar to
CHAMP and an elegant solution to the ``noisy oracle'' problem which
allows the system to learn recursive programs without requiring a
complete set of positive examples. Systematic experimental
comparisons to both GOLEM and FOIL on a range of problems are used to
clearly demonstrate the advantages of the approach.
John M. Zelle and Raymond J. Mooney
Proceedings of the Twelfth National Conference on
AI, pp. 748-753, Seattle, WA, July 1994. (AAAI-94)
This paper presents a method for constructing deterministic, context-sensitive,
Prolog parsers from corpora of parsed sentences. Our approach uses recent
machine learning methods for inducing Prolog rules from examples (inductive
logic programming). We discuss several advantages of this method compared to
recent statistical methods and present results on learning complete parsers
from portions of the ATIS corpus.
Raymond J. Mooney and John M. Zelle
SIGART Bulletin, Volume 5, number 1, Jan. 1994, pp 12-21.
This paper presents a review of recent work that integrates methods from
Inductive Logic Programming (ILP) and Explanation-Based Learning (EBL). ILP
and EBL methods have complementary strengths and weaknesses and a number of
recent projects have effectively combined them into systems with better
performance than either of the individual approaches. In particular, integrated
systems have been developed for guiding induction with prior knowledge
(ML-SMART, FOCL, GRENDEL) refining imperfect domain theories
(FORTE, AUDREY, Rx), and learning effective search-control
knowledge (AxA-EBL, DOLPHIN).
Paul T. Baffes and Raymond J. Mooney
Informatica, 17 (1993), pp. 387-397.
In recent years, machine learning research has started addressing a problem
known as theory refinement. The goal of a theory refinement learner is
to modify an incomplete or incorrect rule base, representing a domain theory,
to make it consistent with a set of input training examples. This paper
presents a major revision of the EITHER propositional theory refinement
system. Two issues are discussed. First, we show how run time efficiency can
be greatly improved by changing from a exhaustive scheme for computing
repairs to an iterative greedy method. Second, we show how to extend
EITHER to refine MofN rules. The resulting algorithm, Neither (New
EITHER), is more than an order of magnitude faster and produces
significantly more accurate results with theories that fit the MofN
format. To demonstrate the advantages of NEITHER, we present experimental
results from two real-world domains.
Cynthia Thompson
M.A. Thesis, Department of Computer Sciences, University of Texas at Austin, 1993.
A new system for learning by induction, called LAB, is presented. LAB
(Learning for ABduction) learns abductive rules based on a set of training
examples. Our goal is to find a small knowledge base which, when used
abductively, diagnoses the training examples correctly, in addition to
generalizing well to unseen examples. This is in contrast to past systems,
which inductively learn rules which are used deductively. Abduction is
particularly well suited to diagnosis, in which we are given a set of symptoms
(manifestations) and we want our output to be a set of disorders which explain
why the manifestations are present. Each training example is associated with
potentially multiple categories, instead of one, which is the case with typical
learning systems. Building the knowledge base requires a choice between
multiple possibilities, and the number of possibilities grows exponentially
with the number of training examples. One method of choosing the best
knowledge base is described and implemented. The final system is
experimentally evaluated, using data from the domain of diagnosing brain damage
due to stroke. It is compared to other learning systems and a knowledge base
produced by an expert. The results are promising: the rule base learned is
simpler than the expert knowledge base and rules learned by one of the other
systems, and the accuracy of the learned rule base in predicting which areas
are damaged is better than all the other systems as well as the expert
knowledge base.
Paul T. Baffes
Ph.D. proposal, Department of Computer Sciences, University of Texas
at Austin, 1993.
A new student modeling system called ASSERT is described which uses domain
independent learning algorithms to model unique student errors and to
automatically construct bug libraries. ASSERT consists of two learning phases.
The first is an application of theory refinement techniques for constructing
student models from a correct theory of the domain being tutored. The second
learning cycle automatically constructs the bug library by extracting common
refinements from multiple student models which are then used to bias future
modeling efforts. Initial experimental data will be presented which suggests
that ASSERT is a more effective modeling system than other induction techniques
previously explored for student modeling, and that the automatic bug library
construction significantly enhances subsequent modeling efforts.
John M. Zelle
Ph.D. proposal, Department of Computer Sciences, University of Texas
at Austin, 1993.
This paper presents a general framework, learning search-control heuristics for
logic programs, which can be used to improve both the efficiency and accuracy
of knowledge-based systems expressed as definite-clause logic programs. The
approach combines techniques of explanation-based learning and recent advances
in inductive logic programming to learn clause-selection heuristics that guide
program execution. Two specific applications of this framework are detailed:
dynamic optimization of Prolog programs (improving efficiency) and natural
language acquisition (improving accuracy). In the area of program
optimization, a prototype system, DOLPHIN, is able to transform some
intractable specifications into polynomial-time algorithms, and outperforms
competing approaches in several benchmark speedup domains. A prototype
language acquisition system, CHILL, is also described. It is capable of
automatically acquiring semantic grammars, which uniformly incorprate syntactic
and semantic constraints to parse sentences into case-role representations.
Initial experiments show that this approach is able to construct accurate
parsers which generalize well to novel sentences and significantly outperform
previous approaches to learning case-role mapping based on connectionist
techniques. Planned extensions of the general framework and the specific
applications as well as plans for further evaluation are also discussed.
John M. Zelle and Raymond J. Mooney
Proceedings of the Thirteenth International Joint Conference on Artificial
Intelligence, pp. 1106-1111, Chambery, France, 1993. (IJCAI-93)
This paper presents an algorithm that combines traditional EBL
techniques and recent developments in inductive logic programming to
learn effective clause selection rules for Prolog programs. When
these control rules are incorporated into the original program,
significant speed-up may be achieved. The algorithm is shown to be an
improvement over competing EBL approaches in several domains.
Additionally, the algorithm is capable of automatically transforming
some intractable algorithms into ones that run in polynomial time.
Paul T. Baffes and Raymond J. Mooney
Proceedings of the Thirteenth International Joint Conference on Artificial
Intelligence, pp. 1135-1140, Chambery, France, 1993. (IJCAI-93)
This paper presents a major revision of the EITHER propositional theory
refinement system. Two issues are discussed. First, we show how run time
efficiency can be greatly improved by changing from a exhaustive scheme for
computing repairs to an iterative greedy method. Second, we show how to extend
EITHER to refine M-of-N rules. The resulting algorithm, NEITHER (New EITHER),
is more than an order of magnitude faster and produces significantly more
accurate results with theories that fit the M-of-N format. To demonstrate the
advantages of NEITHER, we present preliminary experimental results comparing it
to EITHER and various other systems on refining the DNA promoter domain theory.
John M. Zelle and Raymond J. Mooney
Proceedings of the Eleventh National Conference of the American
Association for Artificial Intelligence, pp. 817-822,
Washington, D.C. July 1993 (AAAI-93).
Automating the construction of semantic grammars is a difficult and
interesting problem for machine learning. This paper shows how the
semantic-grammar acquisition problem can be viewed as the learning of
search-control heuristics in a logic program. Appropriate control
rules are learned using a new first-order induction algorithm that
automatically invents useful syntactic and semantic categories.
Empirical results show that the learned parsers generalize well to
novel sentences and out-perform previous approaches based on
connectionist techniques.
J. Jeffrey Mahoney and Raymond J. Mooney
Connection Science, 5 (1993), pp. 339-364. (Special issue on
Architectures for Integrating Neural and Symbolic Processing)
This paper describes Rapture --- a system for revising probabilistic knowledge
bases that combines connectionist and symbolic learning methods. Rapture uses
a modified version of backpropagation to refine the certainty factors of a
Mycin-style rule base and it uses ID3's information gain heuristic to add
new rules. Results on refining three actual expert knowledge bases demonstrate
that this combined approach generally performs better than previous methods.
Bradley L. Richards and Raymond J. Mooney
Machine Learning 19,2 (1995), pp. 95-131. Knowledge acquisition is a difficult and time-consuming
task, and as error-prone as any human activity. The task of
automatically improving an existing knowledge base using learning
methods is addressed by a new class of systems performing theory
refinement. Until recently, such systems were limited to
propositional theories. This paper presents a system, FORTE
(First-Order Revision of Theories from Examples), for refining
first-order Horn-clause theories. Moving to a first-order
representation opens many new problem areas, such as logic program
debugging and qualitative modelling, that are beyond the reach of
propositional systems. FORTE uses a hill-climbing approach to revise
theories. It identifies possible errors in the theory and calls on a
library of operators to develop possible revisions. The best revision
is implemented, and the process repeats until no further revisions are
possible. Operators are drawn from a variety of sources, including
propositional theory refinement, first-order induction, and inverse
resolution. FORTE has been tested in several domains including
logic programming and qualitative modelling.
Raymond J. Mooney
Machine Learning 19,1 (1995) pp. 79-92.
This paper presents results comparing three inductive learning systems using
different representations for concepts, namely: CNF formulae, DNF formulae, and
decision trees. The CNF learner performs surprisingly well. Results on five
natural data sets show that it frequently trains faster and produces more
accurate and simpler concepts. The probable explanation for its superior
performance is that the other systems are more susceptible to the replication
problem.
Siddarth Subramanian
Technical Report AI92-179, Artificial Intelligence Lab,
University of Texas at Austin, March 1991. This proposal presents an approach to explanation that
incorporates the paradigms of belief revision and abduction. We
present an algorithm that combines these techniques and a system
called BRACE that is a preliminary implementation of this
algorithm. We show the applicability of the BRACE approach to a wide
range of domains including scientific discovery, device diagnosis and
plan recognition. Finally, we describe our proposals for a new
implementation, new application domains for our system and extensions
to this approach.
Hwee Tou Ng and Raymond J. Mooney
Submitted for journal publication.
A diverse set of intelligent activities, including natural language
understanding and diagnosis, requires the ability to construct
explanations for observed phenomena. In this paper, we view
explanation as abduction, where an abductive explanation is a
consistent set of assumptions which, together with background
knowledge, logically entails a set of observations. We have
successfully built a domain-independent system, ACCEL, in which
knowledge about a variety of domains is uniformly encoded in
first-order Horn-clause axioms. A general-purpose abduction
algorithm, AAA, efficiently constructs explanations in the
various domains by caching partial explanations to avoid redundant
work. Empirical results show that caching of partial explanations can
achieve more than an order of magnitude speedup in run time. We have
applied our abductive system to two general tasks: plan recognition in
text understanding, and diagnosis of medical diseases, logic circuits,
and dynamic systems. The results indicate that ACCEL is a
general-purpose system capable of plan recognition and diagnosis, yet
efficient enough to be of practical utility.
Hwee Tou Ng and Raymond J. Mooney
Proceedings of the Third International Conference on Principles
of Knowledge Representation and Reasoning, pp. 499-508,
Cambridge, MA, October 1992.
While it has been realized for quite some time within AI that abduction
is a general model of explanation for a variety of tasks, there have
been no empirical investigations into the practical feasibility of a
general, logic-based abductive approach to explanation. In this paper
we present extensive empirical results on applying a general abductive
system, ACCEL, to moderately complex problems in plan recognition
and diagnosis. In plan recognition, ACCEL has been tested on 50
short narrative texts, inferring characters' plans from actions
described in a text. In medical diagnosis, ACCEL has diagnosed 50
real-world patient cases involving brain damage due to stroke
(previously addressed by set-covering methods). ACCEL also uses
abduction to accomplish model-based diagnosis of logic circuits (a full
adder) and continuous dynamic systems (a temperature controller and the
water balance system of the human kidney). The results indicate that
general purpose abduction is an effective and efficient mechanism for
solving problems in plan recognition and diagnosis.
Bradley L. Richards, Ina Kraan, and Benjamin J. Kuipers
Proceedings of the Tenth National Conference on Artificial
Intelligence, pp. 723-728, San Jose, CA, July 1992.
We describe a method of automatically abducing qualitative models from
descriptions of behaviors. We generate, from either quantitative or
qualitative data, models in the form of qualitative differential equations
suitable for use by QSIM. Constraints are generated and filtered both by
comparison with the input behaviors and by dimensional analysis. If the
user provides complete information on the input behaviors and the
dimensions of the input variables, the resulting model is unique,
maximally constrainted, and guaranteed to reproduce the input behaviors.
If the user provides incomplete information, our method will still
generate a model which reproduces the input behaviors, but the model
may no longer be unique. Incompleteness can take several forms: missing
dimensions, values of variables, or entire variables.
Bradley L. Richards and Raymond J. Mooney
Proceedings of the Tenth National Conference on Artificial
Intelligence, pp. 50-55, San Jose, CA, July 1992.
First-order learning systems (e.g., FOIL, FOCL, FORTE) generally rely on
hill-climbing heuristics in order to avoid the combinatorial explosion
inherent in learning first-order concepts. However, hill-climbing leaves
these systems vulnerable to local maxima and local plateaus. We present
a method, called relational pathfinding, which has proven highly effective
in escaping local maxima and crossing local plateaus. We present our
algorithm and provide learning results in two domains: family relationships
and qualitative model building.
John M. Zelle and Raymond J. Mooney
Proceedings of the 1992 Machine Learning Workshop on Knowledge
Compilation and Speedup Learning, Aberdeen Scotland, July 1992.
This paper presents an algorithm that combines traditional EBL
techniques and recent developments in inductive logic programming to
learn effective clause selection rules for Prolog programs. When these
control rules are incorporated into the original program, significant
speed-up may be achieved. The algorithm produces not only EBL-like
speed up of problem solvers, but is capable of automatically
transforming some intractable algorithms into ones that run in
polynomial time.
J. Jeffrey Mahoney & Raymond J. Mooney
Proceedings of the 1992 Machine Learning Workshop on Integrated
Learning in Real Domains, Aberdeen Scotland, July 1992.
This paper describes RAPTURE --- a system for revising probabilistic
theories that combines symbolic and neural-network learning methods.
RAPTURE uses a modified version of backpropagation to refine the certainty
factors of a Mycin-style rule-base and it uses ID3's information gain heuristic
to add new rules. Results on two real-world domains demonstrate that this
combined approach performs as well or better than previous methods.
Paul T. Baffes and John M. Zelle
Proceedings of the 1992 International Joint Conference on Neural
Networks, pp. 392-397, Baltimore, Maryland, June 1992
The ideas presented here are based on two observations of perceptrons: (1) when
the perceptron learning algorithm cycles among hyperplanes, the hyperplanes may
be compared to select one that gives a best SPLIT of the examples, and
(2) it is always possible for the perceptron to build a hyperplane that
separates at least one example from all the rest. We describe the
Extentron, which grows multi-layer networks capable of distinguishing
non-linearly-separable data using the simple perceptron rule for linear
threshold units. The resulting algorithm is simple, very fast, scales well to
large problems, retains the convergence properties of the perceptron, and can
be completely specified using only two parameters. Results are presented
comparing the Extentron to other neural network paradigms and to symbolic
learning systems.
Paul T. Baffes and Raymond J. Mooney
Proceedings of the Fourteenth Annual Conference of the Cognitive
Science Society, pp. 617-622, Bloomington, IN, July 1992.
Student modeling has been identified as an important component to the long
term development of Intelligent Computer-Aided Instruction (ICAI) systems. Two
basic approaches have evolved to model student misconceptions. One uses a
static, predefined library of user bugs which contains the misconceptions
modeled by the system. The other uses induction to learn student
misconceptions from scratch. Here, we present a third approach that uses a
machine learning technique called theory revision. Using theory revision
allows the system to automatically construct a bug library for use in modeling
while retaining the flexibility to address novel errors.
Raymond J. Mooney
Computational Learning Theory and Natural Learning Systems,
Vol. 3, T. Petsche, S. Judd, and S. Hanson, Eds., MIT Press, 1995, pp. 43-53.
This paper presents a preliminary analysis of the sample complexity of theory
revision within the framework of PAC (Probably Approximately Correct)
learnability theory. By formalizing the notion that the initial theory is
``close'' to the correct theory we show that the sample complexity of an
optimal propositional Horn-clause theory revision algorithm is $O( ( \ln 1 /
\delta + d \ln (s_0 + d + n) ) / \epsilon)$, where $d$ is the {\em syntactic
distance} between the initial and correct theories, $s_0$ is the size of
initial theory, $n$ is the number of observable features, and $\epsilon$ and
$\delta$ are the standard PAC error and probability bounds. The paper also
discusses the problems raised by the computational complexity of theory
revision.
Raymond J. Mooney & Bradley L. Richards
Proceedings of the Second International Workshop on Inductive
Logic Programming, Tokyo, Japan, June 1992.
This paper presents results on using a theory revision system to automatically
debug logic programs. FORTE is a recently developed system for revising
function-free Horn-clause theories. Given a theory and a set of training
examples, it performs a hill-climbing search in an attempt to minimally modify
the theory to correctly classify all of the examples. FORTE makes use of
methods from propositional theory revision, Horn-clause induction (FOIL),
and inverse resolution. The system has has been successfully used to debug
logic programs written by undergraduate students for a programming languages
course.
Raymond J. Mooney
Proceedings of AAAI Spring Symposium on Knowledge
Assimilation, Standford, CA, March, 1992.
Most existing theory refinement systems are not incremental. However, any
theory refinement system whose input and output theories are compatible can be
used to incrementally assimilate data into an evolving theory. This is done by
continually feeding its revised theory back in as its input theory. An
incremental batch approach, in which the system assimilates a batch of examples
at each step, seems most appropriate for existing theory revision systems.
Experimental results with the EITHER theory refinement system demonstrate
that this approach frequently increases efficiency without significantly
decreasing the accuracy or the simplicity of the resulting theory. However, if
the system produces bad initial changes to the theory based on only small
amount of data, these bad revisions can ``snowball'' and result in an overall
decrease in performance.
Raymond J. Mooney & Dirk Ourston
Machine Learning: A Multistrategy Approach, Vol. IV, R.S. Michalski
& G. Teccuci (eds.), pp.141-164, Morgan Kaufman, San Mateo, CA, 1994.
This chapter describes a multistrategy system that employs independent modules
for deductive, abductive, and inductive reasoning to revise an arbitrarily
incorrect propositional Horn-clause domain theory to fit a set of preclassified
training instances. By combining such diverse methods, EITHER is able
to handle a wider range of imperfect theories than other theory revision
systems while guaranteeing that the revised theory will be consistent with the
training data. EITHER has successfully revised two actual expert
theories, one in molecular biology and one in plant pathology. The results
confirm the hypothesis that using a multistrategy system to learn from both
theory and data gives better results than using either theory or data alone.
Raymond J. Mooney
Categorization by Humans and Machines: The Psychology of Learning
and Motivation, Vol. 29, G. Nakamura, R. Taraban, & D.L. Medin (Eds.),
pp. 189-218, Academic Press, Orlando, FL, 1993.
Recent results in both machine learning and cognitive psychology demonstrate
that effective category learning involves an integration of theory and data.
First, theories can bias induction, affecting what category definitions are
extracted from a set of examples. Second, conflicting data can cause theories
to be revised. Third, theories can alter the representation of data through
feature formation. This chapter reviews two machine learning systems
that attempt to integrate theory and data in one or more of these ways. IOU
uses a domain theory to acquire part of a concept definition and to focus
induction on the unexplained aspects of the data. EITHER uses data to
revise an imperfect theory and uses theory to add abstract features to the
data. Recent psychological experiments reveal that these machine learning
systems exhibit several important aspects of human category learning.
Specifically, IOU has been used to successfully model some recent experimental
results on the effect of functional knowledge on category learning.
Dirk Ourston and Raymond J. Mooney
Artificial Intelligence, 66 (1994), pp. 311--344.
This article describes a comprehensive approach to automatic theory revision.
Given an imperfect theory, the approach combines explanation attempts for
incorrectly classified examples in order to identify the failing portions of
the theory. For each theory fault, correlated subsets of the examples are used
to inductively generate a correction. Because the corrections are
focused, they tend to preserve the structure of the original theory. Because
the system starts with an approximate domain theory, in general fewer training
examples are required to attain a given level of performance (classification
accuracy) compared to a purely empirical system. The approach applies to
classification systems employing a propositional Horn-clause theory. The system
has been tested in a variety of application domains, and results are presented
for problems in the domains of molecular biology and plant disease diagnosis.
Raymond J. Mooney
Machine Learning, 10, 1 (1993), pp. 79-110.
This paper describes and evaluates an approach to combining empirical and
explanation-based learning called Induction Over the Unexplained (IOU).
IOU is intended for learning concepts that can be partially explained by an
overly-general domain theory. An eclectic evaluation of the method is presented
which includes results from all three major approaches: empirical, theoretical,
and psychological. Empirical results shows that IOU is effective at refining
overly-general domain theories and that it learns more accurate concepts from
fewer examples than a purely empirical approach. The application of
theoretical results from PAC learnability theory explains why IOU requires
fewer examples. IOU is also shown to be able to model psychological data
demonstrating the effect of background knowledge on human learning.
Hwee Tou Ng and Raymond J. Mooney
Proceedings of the Ninth National Conference on
Artificial Intelligence, pages 494-499, Anaheim, CA, July 1991.
This paper presents an algorithm for first-order Horn-clause abduction
that uses an ATMS to avoid redundant computation. This algorithm is
either more efficient or more general than any other previous
abduction algorithm. Since computing all minimal abductive
explanations is intractable, we also present a heuristic version of
the algorithm that uses beam search to compute a subset of the
simplest explanations. We present empirical results on a broad range
of abduction problems from text understanding, plan recognition, and
device diagnosis which demonstrate that our algorithm is at least an
order of magnitude faster than an alternative abduction algorithm that
does not use an ATMS.
Dirk Ourston and Raymond J. Mooney
Proceedings of the Eighth International Machine Learning
Workshop, pp. 534-538, Evanston, IL, June 1991.
This paper presents an approach to improving the classification performance of
a multiple category theory by correcting intermediate rules which are shared
among the categories. Using this technique, the performance of a theory in one
category can be improved through training in an entirely different category.
Examples of the technique are presented and experimental results are given.
Raymond J. Mooney and Dirk Ourston
Proceedings of the Eighth International Machine Learning
Workshop, pp. 178-182, Evanston, IL. June 1991.
This paper presents constructive induction techniques recently added to the
EITHER theory refinement system. These additions allow EITHER to handle
arbitrary gaps at the ``top,'' ``middle,'' and/or ``bottom'' of an incomplete
domain theory. Intermediate concept utilization employs existing rules
in the theory to derive higher-level features for use in induction.
Intermediate concept creation employs inverse resolution to introduce new
intermediate concepts in order to fill gaps in a theory that span multiple
levels. These revisions allow EITHER to make use of imperfect domain theories
in the ways typical of previous work in both constructive induction and theory
refinement. As a result, EITHER is able to handle a wider range of theory
imperfections than does any other existing theory refinement system.
Raymond J. Mooney and Dirk Ourston
Technical Report AI 91-153, Artificial Intelligence Lab, University of
Texas at Austin, March 1991.
This paper presents a method for revising an approximate domain theory based on
noisy data. The basic idea is to avoid making changes to the theory that
account for only a small amount of data. This method is implemented in the
EITHER propositional Horn-clause theory revision system. The paper
presents empirical results on artificially corrupted data to show that this
method successfully prevents over-fitting. In other words, when the data is
noisy, performance on novel test data is considerably better than revising the
theory to completely fit the data. When the data is not noisy, noise processing
causes no significant degradation in performance. Finally, noise processing
increases efficiency and decreases the complexity of the resulting theory.
Hwee Tou Ng and Raymond J. Mooney
Proceedings of the Eighth National Conference on
Artificial Intelligence, pages 337-342, Boston, MA, 1990.
Abduction is an important inference process underlying much of human
intelligent activities, including text understanding, plan
recognition, disease diagnosis, and physical device diagnosis. In
this paper, we describe some problems encountered using abduction to
understand text, and present some solutions to overcome these
problems. The solutions we propose center around the use of a
different criterion, called explanatory coherence, as the
primary measure to evaluate the quality of an explanation. In
addition, explanatory coherence plays an important role in the
construction of explanations, both in determining the appropriate
level of specificity of a preferred explanation, and in guiding the
heuristic search to efficiently compute explanations of sufficiently
high quality.
estlin@cs.utexas.edu