Special Issue on Empirical Natural Language Processing
AI Magazine, Vol. 18, No. 4, Winter 1997 (AAAI members see here for PDF versions of the papers)


Abstracts for papers in the collection edited by Eric Brill and Raymond Mooney

  • An Overview of Empirical Natural Language Processing
    Eric Brill and Raymond J. Mooney
    18(4): Winter 1997, 13-24
    In recent years, there has been a resurgence in research on empirical methods in natural language processing. These methods employ learning techniques to automatically extract linguistic knowledge from natural language corpora rather than require the system developer to manually encode the requisite knowledge. The current special issue reviews recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction, and machine translation. This article presents an introduction to the series of specialized articles on these topics and attempts to describe and explain the growing interest in using learning methods to aid the development of natural language processing systems.

  • Linguistic Knowledge and Empirical Methods in Speech Recognition
    Andreas Stolcke
    18(4): Winter 1997, 25-32
    Automatic speech recognition is one of the fastest growing and commercially most promising applications of natural language technology. The technology has achieved a point where carefully designed systems for suitably constrained applications are a reality. Commercial systems are available today for such tasks as large-vocabulary dictation and voice control of medical equipment. This article reviews how state-of-the-art speech-recognition systems combine statistical modeling, linguistic knowledge, and machine learning to achieve their performance and points out some of the research issues in the field.

  • Statistical Techniques for Natural Language Parsing
    Eugene Charniak
    18(4): Winter 1997, 33-44
    I review current statistical work on syntactic parsing and then consider part-of-speech tagging, which was the first syntactic problem to successfully be attacked by statistical techniques and also serves as a good warm-up for the main topic-statistical parsing. Here, I consider both the simplified case in which the input string is viewed as a string of parts of speech and the more interesting case in which the parser is guided by statistical information about the particular words in the sentence. Finally, I anticipate future research directions.

  • Corpus-Based Approaches to Semantic Interpretation in NLP
    Hwee Tou Ng and John Zelle
    18(4): Winter 1997, 45-64
    In recent years, there has been a flurry of research into empirical, corpus-based learning approaches to natural language processing (NLP). Most empirical NLP work to date has focused on relatively low-level language processing such as part-of-speech tagging, text segmentation, and syntactic parsing. The success of these approaches has stimulated research in using empirical learning techniques in other facets of NLP, including semantic analysis--uncovering the meaning of an utterance. This article is an introduction to some of the emerging research in the application of corpus-based learning techniques to problems in semantic interpretation. In particular, we focus on two important problems in semantic interpretation, namely, word-sense disambiguation and semantic parsing.

  • Empirical Methods in Information Extraction
    Claire Cardie
    18(4): Winter 1997, 65-80
    This article surveys the use of empirical, machine-learning methods for a particular natural language- understanding task-information extraction. The author presents a generic architecture for information-extraction systems and then surveys the learning algorithms that have been developed to address the problems of accuracy, portability, and knowledge acquisition for each component of the architecture.

  • Automating Knowledge Acquisition for Machine Translation
    Kevin Knight
    18(4): Winter 1997, 81-96
    Machine translation of human languages (for example, Japanese, English, Spanish) was one of the earliest goals of computer science research, and it remains an elusive one. Like many AI tasks, trans-lation requires an immense amount of knowledge about language and the world. Recent approaches to machine translation frequently make use of text-based learning algorithms to fully or partially automate the acquisition of knowledge. This article illustrates these approaches.