Machine Learning Research Group | University of Texas

Publications: Natural Language Processing

Natural Language Processing is a broad area that includes various approaches to building computational systems that understand and generate language, as well as categorization and analysis of text documents, and cognitive models of human language processing.

Sub-areas:

Hide abstracts

Mixed-Initiative Dialog for Human-Robot Collaborative Manipulation
[Details] [PDF]
Albert Yu, Chengshu Li, Luca Macesanu, Arnav Balaji, Ruchira Ray, Raymond Mooney, Roberto Martín-Martín
Preprint, August 2025.
Effective robotic systems for long-horizon human-robot collaboration must adapt to a wide range of human partners, whose physical behavior, willingness to assist, and understanding of the robot's capabilities may change over time. This demands a tightly coupled communication loop that grants both agents the flexibility to propose, accept, or decline requests as they coordinate toward completing the task effectively. We apply a Mixed-Initiative dialog paradigm to Collaborative human-roBot teaming and propose MICoBot, a system that handles the common scenario where both agents, using natural language, take initiative in formulating, accepting, or rejecting proposals on who can best complete different steps of a task. To handle diverse, task-directed dialog, and find successful collaborative strategies that minimize human effort, MICoBot makes decisions at three levels: (1) a meta-planner considers human dialog to formulate and code a high-level collaboration strategy, (2) a planner optimally allocates the remaining steps to either agent based on the robot's capabilities (measured by a simulation-pretrained affordance model) and the human's estimated availability to help, and (3) an action executor decides the low-level actions to perform or words to say to the human. Our extensive evaluations in simulation and real-world -- on a physical robot with 18 unique human participants over 27 hours -- demonstrate the ability of our method to effectively collaborate with diverse human users, yielding significantly improved task success and user experience than a pure LLM baseline and other agent allocation models. See additional videos and materials at https://robin-lab.cs.utexas.edu/MicoBot/.
ML ID: 441
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
[Details] [PDF]
Jierui Li, Hung Le, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Doyen Sahoo
In Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), April 2025.
Pre-trained on massive amounts of code and text data, large language models (LLMs) have demonstrated remarkable achievements in performing code generation tasks. With additional execution-based feedback, these models can act as agents with capabilities to self-refine and improve generated code autonomously. However, on challenging coding tasks with extremely large search space, current agentic approaches still struggle with multi-stage planning, generating, and debugging. To address this problem, we propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process. Specifically, we adopted a unified tree structure to explicitly explore different coding strategies, generate corresponding coding solutions, and subsequently refine the solutions. In each stage, critical decision-making (ranking, termination, expanding) of the exploration process is guided by both the environmental execution-based feedback and LLM-agent-generated feedback. We comprehensively evaluated CodeTree on 7 code generation benchmarks and demonstrated the significant performance gains of CodeTree against strong baselines. Using GPT-4o as the base model, we consistently achieved top results of 95.1 on HumanEval, 98.7 on MBPP, and 43.0 on CodeContests. On the challenging SWEBench benchmark, our approach led to significant performance gains.
ML ID: 438
CAT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans
[Details] [PDF] [Slides (PPT)] [Slides (PDF)] [Video]
Yash Kumar Lal, Vanya Cohen, Nathanael Chambers, Niranjan Balasubramanian, Raymond Mooney
Empirical Methods in Natural Language Processing (EMNLP), November 2024.
Understanding the abilities of LLMs to reason about natural language plans, such as instructional text and recipes, is critical to reliably using them in decision-making systems. A fundamental aspect of plans is the temporal order in which their steps needs to be executed, which reflects the underlying causal dependencies between them. We introduce CaT-Bench, a benchmark of Step Order Prediction questions, which test whether a step must necessarily occur before or after another in cooking recipe plans. We use this to evaluate how well frontier LLMs understand causal and temporal dependencies. We find that SOTA LLMs are underwhelming (best zero-shot is only 0.59 in F1), and are biased towards predicting dependence more often, perhaps relying on temporal order of steps as a heuristic. While prompting for explanations and using few-shot examples improve performance, the best F1 result is only 0.73. Further, human evaluation of explanations along with answer correctness show that, on average, humans do not agree with model reasoning. Surprisingly, we also find that explaining after answering leads to better performance than normal chain-of-thought prompting, and LLM answers are not consistent across questions about the same step pairs. Overall, results show that LLMs' ability to detect dependence between steps has significant room for improvement.
ML ID: 433
CONTRADOC: Understanding Self-Contradictions in Documents with Large Language Models
[Details] [PDF] [Poster]
Jierui Li, Vipul Raheja, Dhruv Kumar
In North American Chapter of the Association for Computational Linguistics (NAACL), June 2024.
In recent times, large language models (LLMs) have shown impressive performance on various document-level tasks such as document classification, summarization, and question-answering. However, research on understanding their capabilities on the task of self-contradictions in long documents has been very limited. In this work, we introduce CONTRADOC, the first human-annotated dataset to study self-contradictions in long documents across multiple domains, varying document lengths, self-contradiction types, and appearance scope. We then analyze the current capabilities of four state-of-the-art open-source and commercially available LLMs: GPT3.5, GPT4, PaLM2, and LLaMAv2 on this dataset. While GPT4 performs the best and can outperform humans on this task, we find that it is still unreliable and struggles with self-contradictions that require more nuance and context. We release the dataset 1 and all the code associated with the experiments.
ML ID: 429
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
[Details] [PDF] [Slides (PDF)] [Poster] [Video]
Ziru Chen, Michael White, Raymond Mooney, Ali Payani, Yu Su, Huan Sun
In Association for Computational Linguistics (ACL), August 2024.
In this paper, we examine how large language models (LLMs) solve multi-step problems under a language agent framework with three components: a generator, discriminator, and a planning method. We investigate the practical utility of two advanced planning methods, iterative correction and tree search. We present a comprehensive analysis of how discrimination accuracy affects the overall performance of agents when using these two methods or a simpler method, re-ranking. Experiments on two tasks, text-to-SQL parsing and mathematical reasoning, show that: (1) advanced planning methods demand discriminators with at least 90 percent accuracy to achieve significant improvements over re-ranking; (2) current LLMs’ discrimination abilities have not met the needs of advanced planning methods to achieve such improvements; (3) with LLM-based discriminators, advanced planning methods may not adequately balance accuracy and efficiency. For example, compared to the other two methods, tree search is at least 10–20 times slower but leads to negligible performance gains, which hinders its real-world applications.
ML ID: 428
Sparse Meets Dense: A Hybrid Approach to Enhance Scientific Document Retrieval
[Details] [PDF] [Slides (PPT)]
Priyanka Mandikal, Raymond Mooney
In The 4th Workshop on Scientific Document Understanding, AAAI, February 2024.
Traditional information retrieval is based on sparse bag-of-words vector representations of documents and queries. More recent deep-learning approaches have used dense embeddings learned using a transformer-based large language model. We show that on a classic benchmark on scientific document retrieval in the medical domain of cystic fibrosis, that both of these models perform roughly equivalently. Notably, dense vectors from the state-of-the-art SPECTER2 model do not significantly enhance performance. However, a hybrid model that we propose combining these methods yields significantly better results, underscoring the merits of integrating classical and contemporary deep learning techniques in information retrieval in the domain of specialized scientific documents.
ML ID: 425
“Female Astronaut: Because sandwiches won’t make themselves up there!": Towards multi-modal misogyny detection in memes
[Details] [PDF]
Smriti Singh, Amritha Haridasan, Raymond Mooney
Association of Computational Linguistics (ACL), Workshop on Online Abuse and Harms (WOAH), July 2023.
A rise in the circulation of memes has led to the spread of a new form of multimodal hateful content. Unfortunately, the degree of hate women receive on the internet is disproportionately skewed against them. This, combined with the fact that multimodal misogyny is more challenging to detect as opposed to traditional text-based misogyny, signifies that the task of identifying misogynistic memes online is one of utmost importance. To this end, the MAMI dataset was released, consisting of 12000 memes annotated for misogyny and four sub-classes of misogyny - shame, objectification, violence and stereotype. While this balanced dataset is widely cited, we find that the task itself remains largely unsolved. Thus, in our work, we investigate the performance of multiple models in an effort to analyse whether domain specific pretraining helps model performance. We also investigate why even state of the art models find this task so challenging, and whether domain-specific pretraining can help. Our results show that pretraining BERT on hateful memes and leveraging an attention-based approach with ViT outperforms state of the art models by more than 10 percent. Further, we provide insight into why these models may be struggling with this task with an extensive qualitative analysis of random samples from the test set.
Using Commonsense Knowledge to Answer Why-Questions
[Details] [PDF] [Video]
Yash Kumar Lal, Niket Tandon, Tanvi Aggarwal, Horace Liu, Nathanael Chambers, Raymond Mooney, Niranjan Balasubramanian
In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, December 2022.
Answering questions in narratives about why events happened often requires commonsense knowledge external to the text. What aspects of this knowledge are available in large language models? What aspects can be made accessible via external commonsense resources? We study these questions in the context of answering questions in the TELLMEWHY dataset using COMET as a source of relevant commonsense relations. We analyze the effects of model size (T5 variants and GPT-3) along with methods of injecting knowledge (COMET) into these models. Results show that the largest models, as expected, yield substantial improvements over base models and injecting external knowledge helps models of all sizes. We also find that the format in which knowledge is provided is critical, and that smaller models benefit more from larger amounts of knowledge. Finally, we develop an ontology of knowledge types and analyze the relative coverage of the models across these categories.
ML ID: 407
Updated Headline Generation: Creating Updated Summaries for Evolving News Stories
[Details] [PDF] [Slides (PDF)] [Poster] [Video]
Sheena Panthaplackel, Adrian Benton, Mark Dredze
In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, May 2022.
We propose the task of updated headline generation, in which a system generates a headline for an updated article, considering both the previous article and headline. The system must identify the novel information in the article update, and modify the existing headline accordingly. We create data for this task using the NewsEdits corpus (Spangher and May, 2021) by automatically identifying contiguous article versions that are likely to require a substantive headline update. We find that models conditioned on the prior headline and body re-visions produce headlines judged by humans to be as factual as gold headlines while making fewer unnecessary edits compared to a standard headline generation model. Our experiments establish benchmarks for this new contextual summarization task.
ML ID: 405
TellMeWhy: A Dataset for Answering Why-Questions in Narratives
[Details] [PDF] [Slides (PDF)] [Video]
Yash Kumar Lal, Nathanael Chambers, Raymond Mooney, Niranjan Balasubramanian
In Findings of ACL 2021, August 2021.
Answering questions about why characters perform certain actions is central to understanding and reasoning about narratives. Despite recent progress in QA, it is not clear if existing models have the ability to answer “why” questions that may require common-sense knowledge external to the input narrative. In this work, we introduceTellMeWhy, a new crowd-sourced dataset that consists of more than 30k questions and free-form answers concerning why characters in short narratives perform the actions described. For a third of this dataset, the answers are not present within the narrative. Given the limitations of automated evaluation for this task, we also present a systematized human evaluation interface for this dataset. Our evaluation of state-of-the-art models shows that they are far below human performance on answering such questions. They are especially worse on questions whose answers are external to the narrative, thus providing a challenge for future QAand narrative understanding research.
ML ID: 406
Facilitating Software Evolution through Natural Language Comments and Dialogue
[Details] [PDF] [Slides (PDF)] [Video]
Sheena Panthaplackel
October 2021. Ph.D. Proposal.
Software projects are continually evolving, as developers incorporate changes to refactor code, support new functionality, and fix bugs. To uphold software quality amidst constant changes and also facilitate the prompt implementation of critical changes, it is desirable to have automated tools for guiding developers in making methodical software changes. We explore tasks and data and design machine learning approaches which leverage natural language to serve this purpose. When developers make code changes, they sometimes fail to update the accompanying natural language comments documenting various aspects of the code, which can lead to confusion and vulnerability to bugs. We present our completed work on alerting developers of inconsistent comments upon code changes and suggesting updates by learning to correlate comments and code. When a bug is reported, developers engage in a dialogue to collaboratively understand it and ultimately resolve it. While the solution is likely formulated within the discussion, it is often buried in a large amount of text, making it difficult to comprehend, which delays its implementation through the necessary repository changes. To guide developers in more easily absorbing information relevant towards making these changes and consequently expedite bug resolution, we investigate generating a concise natural language description of the solution by synthesizing relevant content as it emerges in the discussion. In completed work, we benchmark models for generating solution descriptions and design a classifier for determining when sufficient context for generating an informative description becomes available. We also investigate a pipelined approach for real-time generation, entailing separate classification and generation models. For future work, we propose an improved classifier and also a more intricate system that is jointly trained on generation and classification. Next, we intend to study a system that can interactively generate natural language descriptions that can drive code changes. Finally, we plan to investigate how we can leverage the discussion context to also suggest concrete code changes for bug resolution
ML ID: 399
Using Natural Language to Aid Task Specification in Sequential Decision Making Problems
[Details] [PDF] [Slides (PDF)] [Video]
Prasoon Goyal
October 2021. Ph.D. Proposal.
Intelligent agents that can help humans accomplish everyday tasks, such as a personal robot at home or a robot in a work environment, is a long-standing goal of artificial intelligence. One of the requirements for such general-purpose agents is the ability to teach them new tasks or skills relatively easily. Common approaches to teaching agents new skills include reinforcement learning (RL) and imitation learning (IL). However, specifying the task to the learning agent, i.e. designing effective reward functions for reinforcement learning and providing demonstrations for imitation learning, are often cumbersome and time-consuming. We aim to use natural language as an auxiliary signal to aid task specification, which reduces the burden on the end user. To make reward design easier, we propose a novel framework that is used to generate language-based rewards in addition to the extrinsic rewards from the environment for faster policy training using RL. To ameliorate the problem of providing demonstrations, we propose a new setting that enables an agent to learn a new task without demonstrations in an IL setting, given a demonstration from a related task and a natural language description of the difference between the desired task and the demonstrated task. The primary contributions of this dissertation will be new frameworks that enable incorporating natural language in RL and IL, which would enable non-expert users to specify new tasks to intelligent agents more conveniently.
ML ID: 398
Dialog as a Vehicle for Lifelong Learning
[Details] [PDF] [Slides (PDF)] [Video]
Aishwarya Padmakumar, Raymond J. Mooney
In Position Paper Track at the SIGDIAL Special Session on Physically Situated Dialogue (RoboDial 2.0), July 2020.
Dialog systems research has primarily been focused around two main types of applications – task-oriented dialog systems that learn to use clarification to aid in understanding a goal, and open-ended dialog systems that are expected to carry out unconstrained “chit chat” conversations. However, dialog interactions can also be used to obtain various types of knowledge that can be used to improve an underlying language understanding system, or other machine learning systems that the dialog acts over. In this position paper, we present the problem of designing dialog systems that enable lifelong learning as an important challenge problem, in particular for applications involving physically situated robots. We include examples of prior work in this direction, and discuss challenges that remain to be addressed.
ML ID: 386
Evaluating the Robustness of Natural Language Reward Shaping Models to Spatial Relations
[Details] [PDF] [Slides (PPT)] [Slides (PDF)]
Antony Yun
May 2020. Undergraduate Honors Thesis, Computer Science Department, University of Texas at Austin.
As part of an effort to bridge the gap between using reinforcement learning in simulation and in the real world, we probe whether current reward shaping models are able to encode relational data between objects in the environment. We construct an augmented dataset for controlling a robotic arm in the Meta-World platform to test whether current models are able to discriminate between target objects based on their relations. We found that state of the art models are indeed expressive enough to achieve performance comparable to the gold standard, so this specific experiment did not uncover any obvious shortcomings.
ML ID: 384
Using Natural Language for Reward Shaping in Reinforcement Learning
[Details] [PDF] [Slides (PDF)] [Poster]
Prasoon Goyal, Scott Niekum, Raymond J. Mooney
In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.
Recent reinforcement learning (RL) approaches have shown strong performance in complex domains such as Atari games, but are often highly sample inefficient. A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal. However, designing appropriate shaping rewards is known to be difficult as well as time-consuming. In this work, we address this problem by using natural language instructions to perform reward shaping. We propose the LanguagE-Action Reward Network (LEARN), a framework that maps free-form natural language instructions to intermediate rewards based on actions taken by the agent. These intermediate language-based rewards can seamlessly be integrated into any standard reinforcement learning algorithm. We experiment with Montezuma’s Revenge from the Atari Learning Environment, a popular benchmark in RL. Our experiments on a diverse set of 15 tasks demonstrate that, for the same number of interactions with the environment, language-based rewards lead to successful completion of the task 60 % more often on average, compared to learning without language.
ML ID: 376
Explainable Improved Ensembling for Natural Language and Vision
[Details] [PDF] [Slides (PPT)] [Slides (PDF)]
Nazneen Rajani
PhD Thesis, Department of Computer Science, The University of Texas at Austin, July 2018.
Ensemble methods are well-known in machine learning for improving prediction accuracy. However, they do not adequately discriminate among underlying component models. The measure of how good a model is can sometimes be estimated from “why” it made a specific prediction. We propose a novel approach called Stacking With Auxiliary Features (SWAF) that effectively leverages component models by integrating such relevant information from context to improve ensembling. Using auxiliary features, our algorithm learns to rely on systems that not just agree on an output prediction but also the source or origin of that output. We demonstrate our approach to challenging structured prediction problems in Natural Language Processing and Vision including Information Extraction, Object Detection, and Visual Question Answering. We also present a variant of SWAF for combining systems that do not have training data in an unsupervised ensemble with systems that do have training data. Our combined approach obtains a new state-of-the-art, beating our prior performance on Information Extraction. The state-of-the-art systems on many AI applications are ensembles of deep-learning models. These models are hard to interpret and can sometimes make odd mistakes. Explanations make AI systems more transparent and also justify their predictions. We propose a scalable approach to generate visual explanations for ensemble methods using the localization maps of the component systems. Crowdsourced human evaluation on two new metrics indicates that our ensemble’s explanation significantly qualitatively outperforms individual systems’ explanations.
ML ID: 364
Distributional modeling on a diet: One-shot word learning from text only
[Details] [PDF]
Su Wang and Stephen Roller and Katrin Erk
In In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP-17), Taipei, Taiwan, November 2017.
We test whether distributional models can do one-shot learning of definitional properties from text only. Using Bayesian models, we find that first learning overarching structure in the known data, regularities in textual contexts and in properties, helps one-shot learning, and that individual context items can be highly informative. Our experiments show that our model can learn properties from a single exposure when given an informative utterance.
ML ID: 354
Dialog for Language to Code
[Details] [PDF] [Poster]
Shobhit Chaurasia and Raymond J. Mooney
In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP-17), 175-180, Taipei, Taiwan, November 2017.
Generating computer code from natural language descriptions has been a long- standing problem. Prior work in this domain has restricted itself to generating code in one shot from a single description. To overcome this limitation, we propose a system that can engage users in a dialog to clarify their intent until it has all the information to produce correct code. To evaluate the efficacy of dialog in code generation, we focus on synthesizing conditional statements in the form of IFTTT recipes.
ML ID: 353
Leveraging Discourse Information Effectively for Authorship Attribution
[Details] [PDF] [Slides (PDF)] [Video]
Elisa Ferracane and Su Wang and Raymond J. Mooney
In In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP-17), 584–593, Taipei, Taiwan, November 2017.
We explore techniques to maximize the effectiveness of discourse information in the task of authorship attribution. We present a novel method to embed discourse features in a Convolutional Neural Network text classifier, which achieves a state-of-the-art result by a significant margin. We empirically investigate several featurization methods to understand the conditions under which discourse features contribute non-trivial performance gains, and analyze discourse embeddings.
ML ID: 352
Natural-Language Video Description with Deep Recurrent Neural Networks
[Details] [PDF] [Slides (PDF)]
Subhashini Venugopalan
PhD Thesis, Department of Computer Science, The University of Texas at Austin, August 2017.
For most people, watching a brief video and describing what happened (in words) is an easy task. For machines, extracting meaning from video pixels and generating a sentence description is a very complex problem. The goal of this thesis is to develop models that can automatically generate natural language descriptions for events in videos. It presents several approaches to automatic video description by building on recent advances in “deep” machine learning. The techniques presented in this thesis view the task of video description akin to machine translation, treating the video domain as a source “language” and uses deep neural net architectures to “translate” videos to text. Specifically, I develop video captioning techniques using a unified deep neural network with both convolutional and recurrent structure, modeling the temporal elements in videos and language with deep recurrent neural networks. In my initial approach, I adapt a model that can learn from paired images and captions to transfer knowledge from this auxiliary task to generate descriptions for short video clips. Next, I present an end-to-end deep network that can jointly model a sequence of video frames and a sequence of words. To further improve grammaticality and descriptive quality, I also propose methods to integrate linguistic knowledge from plain text corpora. Additionally, I show that such linguistic knowledge can help describe novel objects unseen in paired image/video-caption data. Finally, moving beyond short video clips, I present methods to process longer multi-activity videos, specifically to jointly segment and describe coherent event sequences in full-length movies.
ML ID: 349
Advances in Statistical Script Learning
[Details] [PDF] [Slides (PPT)]
Karl Pichotta
PhD Thesis, Department of Computer Science, The University of Texas at Austin, August 2017.
When humans encode information into natural language, they do so with the clear assumption that the reader will be able to seamlessly make inferences based on world knowledge. For example, given the sentence ``Mrs. Dalloway said she would buy the flowers herself,'' one can make a number of probable inferences based on event co-occurrences: she bought flowers, she went to a store, she took the flowers home, and so on.
Observing this, it is clear that many different useful natural language end-tasks could benefit from models of events as they typically co-occur (so-called script models). Robust question-answering systems must be able to infer highly-probable implicit events from what is explicitly stated in a text, as must robust information-extraction systems that map from unstructured text to formal assertions about relations expressed in the text. Coreference resolution systems, semantic role labeling, and even syntactic parsing systems could, in principle, benefit from event co-occurrence models.
To this end, we present a number of contributions related to statistical event co-occurrence models. First, we investigate a method of incorporating multiple entities into events in a count-based co-occurrence model. We find that modeling multiple entities interacting across events allows for improved empirical performance on the task of modeling sequences of events in documents.
Second, we give a method of applying Recurrent Neural Network sequence models to the task of predicting held-out predicate-argument structures from documents. This model allows us to easily incorporate entity noun information, and can allow for more complex, higher-arity events than a count-based co-occurrence model. We find the neural model improves performance considerably over the count-based co-occurrence model.
Third, we investigate the performance of a sequence-to-sequence encoder-decoder neural model on the task of predicting held-out predicate-argument events from text. This model does not explicitly model any external syntactic information, and does not require a parser. We find the text-level model to be competitive in predictive performance with an event level model directly mediated by an external syntactic analysis.
Finally, motivated by this result, we investigate incorporating features derived from these models into a baseline noun coreference resolution system. We find that, while our additional features do not appreciably improve top-level performance, we can nonetheless provide empirical improvement on a number of restricted classes of difficult coreference decisions.
ML ID: 348
Dialog for Natural Language to Code
[Details] [PDF]
Shobhit Chaurasia
2017. Masters Thesis, Computer Science Department, University of Texas at Austin.
Generating computer code from natural language descriptions has been a longstanding problem in computational linguistics. Prior work in this domain has restricted itself to generating code in one shot from a single description. To overcome this limitation, we propose a system that can engage users in a dialog to clarify their intent until it is confident that it has all the information to produce correct and complete code. Further, we demonstrate how the dialog conversations can be leveraged for continuous improvement of the dialog system. To evaluate the efficacy of dialog in code generation, we focus on synthesizing conditional statements in the form of IFTTT recipes. IFTTT (if-this-then-that) is a web-service that provides event-driven automation, enabling control of smart devices and web-applications based on user-defined events.
ML ID: 347
Guiding Interaction Behaviors for Multi-modal Grounded Language Learning
[Details] [PDF]
Jesse Thomason and Jivko Sinapov and Raymond J. Mooney
In Proceedings of the Workshop on Language Grounding for Robotics at ACL 2017 (RoboNLP-17), Vancouver, Canada, August 2017.
Multi-modal grounded language learning connects language predicates to physical properties of objects in the world. Sensing with multiple modalities, such as audio, haptics, and visual colors and shapes while performing interaction behaviors like lifting, dropping, and looking on objects enables a robot to ground non-visual predicates like "empty" as well as visual predicates like "red". Previous work has established that grounding in multi-modal space improves performance on object retrieval from human descriptions. In this work, we gather behavior annotations from humans and demonstrate that these improve language grounding performance by allowing a system to focus on relevant behaviors for words like "white" or "half-full" that can be understood by looking or lifting, respectively. We also explore adding modality annotations (whether to focus on audio or haptics when performing a behavior), which improves performance, and sharing information between linguistically related predicates (if "green" is a color, "white" is a color), which improves grounding recall but at the cost of precision.
ML ID: 345
Multi-Modal Word Synset Induction
[Details] [PDF]
Jesse Thomason and Raymond J. Mooney
In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), 4116--4122, Melbourne, Australia, 2017.
A word in natural language can be polysemous, having multiple meanings, as well as synonymous, meaning the same thing as other words. Word sense induction attempts to find the senses of polysemous words. Synonymy detection attempts to find when two words are interchangeable. We combine these tasks, first inducing word senses and then detecting similar senses to form word-sense synonym sets (synsets) in an unsupervised fashion. Given pairs of images and text with noun phrase labels, we perform synset induction to produce collections of underlying concepts described by one or more noun phrases. We find that considering multi-modal features from both visual and textual context yields better induced synsets than using either context alone. Human evaluations show that our unsupervised, multi-modally induced synsets are comparable in quality to annotation-assisted ImageNet synsets, achieving about 84% of ImageNet synsets' approval.
ML ID: 344
Stacking With Auxiliary Features
[Details] [PDF] [Slides (PDF)] [Poster]
Nazneen Fatema Rajani and Raymond J. Mooney
In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), 2634-2640, Melbourne, Australia, 2017.
Ensembling methods are well known for improving prediction accuracy. However, they are limited in the sense that they cannot effectively discriminate among component models. In this paper, we propose stacking with auxiliary features that learns to fuse additional relevant information from multiple component systems as well as input instances to improve performance. We use two types of auxiliary features -- instance features and provenance features. The instance features enable the stacker to discriminate across input instances and the provenance features enable the stacker to discriminate across component systems. When combined together, our algorithm learns to rely on systems that not just agree on an output but also the provenance of this output in conjunction with the properties of the input instance. We demonstrate the success of our approach on three very different and challenging natural language and vision problems: Slot Filling, Entity Discovery and Linking, and ImageNet Object Detection. We obtain new state-of-the-art results on the first two tasks and significant improvements on the ImageNet task, thus verifying the power and generality of our approach.
ML ID: 343
Integrated Learning of Dialog Strategies and Semantic Parsing
[Details] [PDF]
Aishwarya Padmakumar and Jesse Thomason and Raymond J. Mooney
In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), 547--557, Valencia, Spain, April 2017.
Natural language understanding and dialog management are two integral components of interactive dialog systems. Previous research has used machine learning techniques to individually optimize these components, with different forms of direct and indirect supervision. We present an approach to integrate the learning of both a dialog strategy using reinforcement learning, and a semantic parser for robust natural language understanding, using only natural dialog interaction for supervision. Experimental results on a simulated task of robot instruction demonstrate that joint learning of both components improves dialog performance over learning either of these components alone.
ML ID: 342
Captioning Images with Diverse Objects
[Details] [PDF] [Slides (PDF)] [Poster]
Subhashini Venugopalan and Lisa Anne Hendricks and Marcus Rohrbach and Raymond Mooney and Trevor Darrell and Kate Saenko
In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR-17), 5753--5761, 2017.
Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number of object categories not present in existing image-caption datasets. Our model takes advantage of external sources -- labeled images from object recognition datasets, and semantic knowledge extracted from unannotated text. We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets. We demonstrate that our model exploits semantic information to generate captions for hundreds of object categories in the ImageNet object recognition dataset that are not observed in MSCOCO image-caption training data, as well as many categories that are observed very rarely. Both automatic evaluations and human judgements show that our model considerably outperforms prior work in being able to describe many more categories of objects.
ML ID: 341
Stacking With Auxiliary Features for Combining Supervised and Unsupervised Ensembles
[Details] [PDF]
Nazneen Fatema Rajani and Raymond J. Mooney
In Proceedings of the Ninth Text Analysis Conference (TAC 2016), 2016.
We propose stacking with auxiliary features(SWAF) that combines supervised and unsupervised methods to ensemble multiple sys-tems for the Tri-lingual Entity Discovery andLinking (TEDL) 2016 evaluation. We use theTEDL 2015 systems for training and EDL12016 systems for evaluating our algorithm.We perform a post-processing step on the out-puts obtained from the classifier so as to ag-gregate into one final system.
ML ID: 356
Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision
[Details] [PDF] [Slides (PDF)]
Nazneen Fatema Rajani
November 2016. PhD proposal, Department of Computer Science, The University of Texas at Austin.
Ensembling methods are well known in machine learning for improving prediction accuracy. However, they are limited in the sense that they cannot effectively discriminate among underlying component models. Some models perform better at certain types of input instances than other models. The measure of how good a model is can sometimes be gauged from "where" it extracted the output and "why" it made the prediction. This information can be exploited to leverage the component models in an ensemble. In this proposal, we present stacking with auxiliary features that integrates relevant information from multiple sources to improve ensembling. We use two types of auxiliary features - instance features and provenance features. The instance features enable the stacker to discriminate across input instances while the provenance features enable the stacker to discriminate across component systems. When combined together, our algorithm learns to rely on systems that not just agree on an output but also the provenance of this output in conjunction with the input instance type.
We demonstrate our approach on three very different and difficult problems: Cold Start Slot Filling, Tri-lingual Entity Discovery and Linking, and ImageNet Object Detection. The first two problems are well known tasks in Natural Language Processing, and the third one is in the domain of Computer Vision. Our algorithm obtains state-of-the-art results on the first two tasks and significant improvements on the ImageNet task, thus verifying the power and generality of our approach. We also present a novel approach using stacking for combining systems that do not have training data in an unsupervised ensemble with systems that do have training data. Our combined approach achieves state-of-the-art on the Cold Start Slot Filling and Tri-lingual Entity Discovery and Linking tasks, beating our own prior performance on ensembling just the supervised systems.
We propose several short-term and long-term extensions to our work. In the short-term, we focus our work on using more semantic instance-level features for all the three tasks, and use non-lexical features that are language independent for the two NLP tasks. In the long-term we propose to demonstrate our ensembling algorithm on the Visual Question Answering task and use textual/visual explanations as auxiliary features to stacking.
ML ID: 340
An Analysis of Using Semantic Parsing for Speech Recognition
[Details] [PDF] [Slides (PPT)]
Rodolfo Corona
2016. Undergraduate Honors Thesis, Computer Science Department, University of Texas at Austin.
This thesis explores the use of semantic parsing for improving speech recognition performance. Specifically, it explores how a semantic parser may be used in order to re-rank the n-best hypothesis list generated by an automatic speech recognition system. We also explore how system performance is affected when retraining the system's acoustic model using a portion of the re-ranked data.
ML ID: 339
Continuously Improving Natural Language Understanding for Robotic Systems through Semantic Parsing, Dialog, and Multi-modal Perception
[Details] [PDF]
Jesse Thomason
November 2016. PhD proposal, Department of Computer Science, The University of Texas at Austin.
Robotic systems that interact with untrained human users must be able to understand and respond to natural language commands and questions. If a person requests ``take me to Alice's office'', the system and person must know that Alice is a person who owns some unique office. Similarly, if a person requests ``bring me the heavy, green mug'', the system and person must both know ``heavy'', ``green'', and ``mug'' are properties that describe an object in the environment, and have similar ideas about to what objects those properties apply. To facilitate deployment, methods to achieve these goals should require little initial in-domain data.
We present completed work on understanding human language commands using sparse initial resources for semantic parsing. Clarification dialog with humans simultaneously resolves misunderstandings and generates more training data for better downstream parser performance. We introduce multi-modal grounding classifiers to give the robotic system perceptual contexts to understand object properties like ``green'' and ``heavy''. Additionally, we introduce and explore the task of word sense synonym set induction, which aims to discover polysemy and synonymy, which is helpful in the presence of sparse data and ambiguous properties such as ``light'' (light-colored versus lightweight).
We propose to combine these orthogonal components into an integrated robotic system that understands human commands involving both static domain knowledge (such as who owns what office) and perceptual grounding (such as object retrieval). Additionally, we propose to strengthen the perceptual grounding component by performing word sense synonym set induction on object property words. We offer several long-term proposals to improve such an integrated system: exploring novel objects using only the context-necessary set of behaviors, a more natural learning paradigm for perception, and leveraging linguistic accommodation to improve parsing.
ML ID: 338
Natural Language Semantics Using Probabilistic Logic
[Details] [PDF] [Slides (PPT)] [Slides (PDF)]
I. Beltagy
PhD Thesis, Department of Computer Science, The University of Texas at Austin, December 2016.
With better natural language semantic representations, computers can do more applications more efficiently as a result of better understanding of natural text. However, no single semantic representation at this time fulfills all requirements needed for a satisfactory representation. Logic-based representations like first-order logic capture many of the linguistic phenomena using logical constructs, and they come with standardized inference mechanisms, but standard first-order logic fails to capture the "graded" aspect of meaning in languages. Other approaches for semantics, like distributional models, focus on capturing "graded" semantic similarity of words and phrases but do not capture sentence structure in the same detail as logic-based approaches. However, both aspects of semantics, structure and gradedness, are important for an accurate language semantics representation.
In this work, we propose a natural language semantics representation that uses probabilistic logic (PL) to integrate logical with weighted uncertain knowledge. It combines the expressivity and the automated inference of logic with the ability to reason with uncertainty. To demonstrate the effectiveness of our semantic representation, we implement and evaluate it on three tasks, recognizing textual entailment (RTE), semantic textual similarity (STS) and open-domain question answering (QA). These tasks can utilize the strengths of our representation and the integration of logical representation and uncertain knowledge. Our semantic representation has three components, Logical Form, Knowledge Base and Inference, all of which present interesting challenges and we make new contributions in each of them.
The first component is the Logical Form, which is the primary meaning representation. We address two points, how to translate input sentences to logical form, and how to adapt the resulting logical form to PL. First, we use Boxer, a CCG-based semantic analysis tool to translate sentences to logical form. We also explore translating dependency trees to logical form. Then, we adapt the logical forms to ensure that universal quantifiers and negations work as expected.
The second component is the Knowledge Base which contains "uncertain" background knowledge required for a given problem. We collect the "relevant" lexical information from different linguistic resources, encode them as weighted logical rules, and add them to the knowledge base. We add rules from existing databases, in particular WordNet and the Paraphrase Database (PPDB). Since these are incomplete, we generate additional on-the-fly rules that could be useful. We use alignment techniques to propose rules that are relevant to a particular problem, and explore two alignment methods, one based on Robinson's resolution and the other based on graph matching. We automatically annotate the proposed rules and use them to learn weights for unseen rules.
The third component is Inference. This component is implemented for each task separately. We use the logical form and the knowledge base constructed in the previous two steps to formulate the task as a PL inference problem then develop a PL inference algorithm that is optimized for this particular task. We explore the use of two PL frameworks, Markov Logic Networks (MLNs) and Probabilistic Soft Logic (PSL). We discuss which framework works best for a particular task, and present new inference algorithms for each framework.
ML ID: 337
Statistical Script Learning with Recurrent Neural Networks
[Details] [PDF] [Poster]
Karl Pichotta and Raymond J. Mooney
In Proceedings of the Workshop on Uphill Battles in Language Processing (UBLP) at EMNLP 2016, Austin, TX, November 2016.
We describe some of our recent efforts in learning statistical models of co-occurring events from large text corpora using Recurrent Neural Networks.
ML ID: 336
PIC a Different Word: A Simple Model for Lexical Substitution in Context
[Details] [PDF]
Stephen Roller and Katrin Erk
In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-16), 1121-1126, San Diego, California, 2016.
The Lexical Substitution task involves selecting and ranking lexical paraphrases for a target word in a given sentential context. We present PIC, a simple measure for estimating the appropriateness of substitutes in a given context. PIC outperforms another simple, comparable model proposed in recent work, especially when selecting substitutes from the entire vocabulary. Analysis shows that PIC improves over baselines by incorporating frequency biases into predictions.
ML ID: 335
MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification
[Details] [PDF]
Ye Zhang and Stephen Roller and Byron Wallace.
In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-16), 1522--1527, San Diego, California, 2016.
We introduce a novel, simple convolution neural network (CNN) architecture -- multi-group norm constraint CNN (MGNC-CNN) -- that capitalizes on multiple sets of word embeddings for sentence classification. MGNC-CNN extracts features from input embedding sets independently and then joins these at the penultimate layer in the network to form a final feature vector. We then adopt a group regularization strategy that differentially penalizes weights associated with the subcomponents generated from the respective embedding sets. This model is much simpler than comparable alternative architectures and requires substantially less training time. Furthermore, it is flexible in that it does not require input word embeddings to be of the same dimensionality. We show that MGNC-CNN consistently outperforms baseline models.
ML ID: 334
Improved Semantic Parsers For If-Then Statements
[Details] [PDF]
I. Beltagy and Chris Quirk
To Appear In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-16), Berlin, Germany, 2016.
Digital personal assistants are becoming both more common and more useful. The major NLP challenge for personal assistants is machine understanding: translating natural language user commands into an executable representation. This paper focuses on understanding rules written as If-Then statements, though the techniques should be portable to other semantic parsing tasks. We view understanding as structure prediction and show improved models using both conventional techniques and neural network models. We also discuss various ways to improve generalization and reduce overfitting: synthetic training data from paraphrase, grammar combinations, feature selection and ensembles of multiple systems. An ensemble of these techniques achieves a new state of the art result with 8% accuracy improvement.
ML ID: 332
Combining Supervised and Unsupervised Ensembles for Knowledge Base Population
[Details] [PDF]
Nazneen Fatema Rajani and Raymond J. Mooney
To Appear In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), 2016.
We propose an algorithm that combines supervised and unsupervised methods to ensemble multiple systems for two popular Knowledge Base Population (KBP) tasks, Cold Start Slot Filling (CSSF) and Tri-lingual Entity Discovery and Linking (TEDL). We demonstrate that it outperforms the best system for both tasks in the 2015 competition, several ensembling baselines, as well as a state-of-the-art stacking approach. The success of our technique on two different and challenging problems demonstrates the power and generality of our combined approach to ensembling.
ML ID: 331
Using Sentence-Level LSTM Language Models for Script Inference
[Details] [PDF] [Slides (PPT)] [Slides (PDF)]
Karl Pichotta and Raymond J. Mooney
In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-16), 279--289, Berlin, Germany, 2016.
There is a small but growing body of research on statistical scripts, models of event sequences that allow probabilistic inference of implicit events from documents. These systems operate on structured verb-argument events produced by an NLP pipeline. We compare these systems with recent Recurrent Neural Net models that directly operate on raw tokens to predict sentences, finding the latter to be roughly comparable to the former in terms of predicting missing events in documents.
ML ID: 330
Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text
[Details] [PDF] [Poster]
Subhashini Venugopalan and Lisa Anne Hendricks and Raymond Mooney and Kate Saenko
In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), 1961--1966, Austin, Texas, 2016.
This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos. Specifically, we integrate both a neural language model and distributional semantics trained on large text corpora into a recent LSTM-based architecture for video description. We evaluate our approach on a collection of Youtube videos as well as two large movie description datasets showing significant improvements in grammaticality while modestly improving descriptive quality.
ML ID: 328
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
[Details] [PDF]
Lisa Anne Hendricks and Subhashini Venugopalan and Marcus Rohrbach and Raymond Mooney and Kate Saenko and Trevor Darrell
In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR-16), 1--10, 2016.
While recent deep neural network models have achieved promising results on the image captioning task, they rely largely on the availability of corpora with paired image and sentence captions to describe objects in context. In this work, we propose the Deep Compositional Captioner (DCC) to address the task of generating descriptions of novel objects which are not present in paired image-sentence datasets. Our method achieves this by leveraging large object recognition datasets and external text corpora and by transferring knowledge between semantically similar concepts. Current deep caption models can only describe objects contained in paired image-sentence corpora, despite the fact that they are pre-trained with large object recognition datasets, namely ImageNet. In contrast, our model can compose sentences that describe novel objects and their interactions with other objects. We demonstrate our model’s ability to describe novel concepts by empirically evaluating its performance on MSCOCO and show qualitative results on ImageNet images of objects for which no paired image-sentence data exist. Further, we extend our approach to generate descriptions of objects in video clips. Our results show that DCC has distinct advantages over existing image and video captioning approaches for generating descriptions of new objects in context.
ML ID: 327
Learning Statistical Scripts with LSTM Recurrent Neural Networks
[Details] [PDF] [Slides (PPT)] [Slides (PDF)]
Karl Pichotta and Raymond J. Mooney
In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI-16), Phoenix, Arizona, February 2016.
Scripts encode knowledge of prototypical sequences of events. We describe a Recurrent Neural Network model for statistical script learning using Long Short-Term Memory, an architecture which has been demonstrated to work well on a range of Artificial Intelligence tasks. We evaluate our system on two tasks, inferring held-out events from text and inferring novel events from text, substantially outperforming prior approaches on both tasks.
ML ID: 325
Representing Meaning with a Combination of Logical and Distributional Models
[Details] [PDF]
I. Beltagy and Stephen Roller and Pengxiang Cheng and Katrin Erk and Raymond J. Mooney
The special issue of Computational Linguistics on Formal Distributional Semantics, 42(4), 2016.
NLP tasks differ in the semantic information they require, and at this time no single semantic representation fulfills all requirements. Logic-based representations characterize sentence structure, but do not capture the graded aspect of meaning. Distributional models give graded similarity ratings for words and phrases, but do not capture sentence structure in the same detail as logic-based approaches. So it has been argued that the two are complementary. We adopt a hybrid approach that combines logical and distributional semantics using probabilistic logic, specifically Markov Logic Networks (MLNs). In this paper, we focus on the three components of a practical system: 1) Logical representation focuses on representing the input problems in probabilistic logic. 2) Knowledge base construction creates weighted inference rules by integrating distributional information with other sources. 3) Probabilistic inference involves solving the resulting MLN inference problems efficiently. To evaluate our approach, we use the task of textual entailment (RTE), which can utilize the strengths of both logic-based and distributional representations. In particular we focus on the SICK dataset, where we achieve state-of-the-art results. We also release a lexical entailment dataset of 10,213 rules extracted from the SICK dataset, which is a valuable resource for evaluating lexical entailment systems
ML ID: 316
Stacked Ensembles of Information Extractors for Knowledge-Base Population by Combining Supervised and Unsupervised Approaches
[Details] [PDF] [Slides (PDF)]
Nazneen Fatema Rajani and Raymond J Mooney
In Proceedings of the Eighth Text Analysis Conference (TAC 2015), November 2015.
The UTAustin team participated in two main tasks this year - the Cold Start Slot Filling (CSSF) task and the Slot-Filler Validation/Ensembling task, which was divided into the filtering and ensembling subtasks. Our system uses stacking to ensemble multiple systems for the KBP slot filling task, as described in our ACL 2015 paper. We expand the stacking approach by allowing the classifier to also utilize additions features that are relevant to making a final decision. Stacking relies on supervised training and hence requires common systems from the 2014 data to be used as training. However, that approach has limitations on performance and therefore we propose a novel approach of combining the supervised approach with an unsupervised approach on the remaining systems. We believe this combination approach gives our best run for the ensembling task. In this paper, we also discuss strategies to handle Cold Start data which comes from multiple hops.
ML ID: 355
Statistical Script Learning with Recurrent Neural Nets
[Details] [PDF] [Slides (PDF)]
Karl Pichotta
December 2015. PhD proposal, Department of Computer Science, The University of Texas at Austin.
Statistical Scripts are probabilistic models of sequences of events. For example, a script model might encode the information that the event "Smith met with the President" should strongly predict the event "Smith spoke to the President." We present a number of results improving the state of the art of learning statistical scripts for inferring implicit events. First, we demonstrate that incorporating multiple arguments into events, yielding a more complex event representation than is used in previous work, helps to improve a co-occurrence-based script system's predictive power. Second, we improve on these results with a Recurrent Neural Network script sequence model which uses a Long Short-Term Memory component. We evaluate in two ways: first, we evaluate systems' ability to infer held-out events from documents (the "Narrative Cloze" evaluation); second, we evaluate novel event inferences by collecting human judgments.
We propose a number of further extensions to this work. First, we propose a number of new probabilistic script models leveraging recent advances in Neural Network training. These include recurrent sequence models with different hidden unit structure and Convolutional Neural Network models. Second, we propose integrating more lexical and linguistic information into events. Third, we propose incorporating discourse relations between spans of text into event co-occurrence models, either as output by an off-the-shelf discourse parser or learned automatically. Finally, we propose investigating the interface between models of event co-occurrence and coreference resolution, in particular by integrating script information into general coreference systems.
ML ID: 326
Natural Language Video Description using Deep Recurrent Neural Networks
[Details] [PDF] [Slides (PDF)]
Subhashini Venugopalan
November 2015. PhD proposal, Department of Computer Science, The University of Texas at Austin.
For most people, watching a brief video and describing what happened (in words) is an easy task. For machines, extracting the meaning from video pixels and generating a sentence description is a very complex problem. The goal of my research is to develop models that can automatically generate natural language (NL) descriptions for events in videos. As a first step, this proposal presents deep recurrent neural network models for video to text generation. I build on recent "deep" machine learning approaches to develop video description models using a unified deep neural network with both convolutional and recurrent structure. This technique treats the video domain as another "language" and takes a machine translation approach using the deep network to translate videos to text. In my initial approach, I adapt a model that can learn on images and captions to transfer knowledge from this auxiliary task to generate descriptions for short video clips. Next, I present an end-to-end deep network that can jointly model a sequence of video frames and a sequence of words. The second part of the proposal outlines a set of models to significantly extend work in this area. Specifically, I propose techniques to integrate linguistic knowledge from plain text corpora; and attention methods to focus on objects and track their interactions to generate more diverse and accurate descriptions. To move beyond short video clips, I also outline models to process multi-activity movie videos, learning to jointly segment and describe coherent event sequences. I propose further extensions to take advantage of movie scripts and subtitle information to generate richer descriptions.
ML ID: 324
Inducing Grammars from Linguistic Universals and Realistic Amounts of Supervision
[Details] [PDF]
Dan Garrette
PhD Thesis, Department of Computer Science, The University of Texas at Austin, 2015.
The best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance can be achieved with plentiful supervised data. However, such resources are extremely costly to produce, making them an unlikely option for building NLP tools in under-resourced languages or domains.
This dissertation is concerned with reducing the annotation required to learn NLP models, with the goal of opening up the range of domains and languages to which NLP technologies may be applied. In this work, we explore the possibility of learning from a degree of supervision that is at or close to the amount that could reasonably be collected from annotators for a particular domain or language that currently has none. We show that just a small amount of annotation input — even that which can be collected in just a few hours — can provide enormous advantages if we have learning algorithms that can appropriately exploit it.
This work presents new algorithms, models, and approaches designed to learn grammatical information from weak supervision. In particular, we look at ways of intersecting a variety of different forms of supervision in complementary ways, thus lowering the overall annotation burden. Sources of information include tag dictionaries, morphological analyzers, constituent bracketings, and partial tree annotations, as well as unannotated corpora. For example, we present algorithms that are able to combine faster-to-obtain type-level annotation with unannotated text to remove the need for slower-to-obtain token-level annotation.
Much of this dissertation describes work on Combinatory Categorial Grammar (CCG), a grammatical formalism notable for its use of structured, logic-backed categories that describe how each word and constituent fits into the overall syntax of the sentence. This work shows how linguistic universals intrinsic to the CCG formalism itself can be encoded as Bayesian priors to improve learning.
ML ID: 321
A Supertag-Context Model for Weakly-Supervised CCG Parser Learning
[Details] [PDF] [Slides (PDF)]
Dan Garrette and Chris Dyer and Jason Baldridge and Noah A. Smith
In Proceedings of the 2015 Conference on Computational Natural Language Learning (CoNLL-2015), 22--31, Beijing, China, 2015.
Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that specify the syntactic configurations in which they may occur. We present a novel parsing model with the capacity to capture the associative adjacent-category relationships intrinsic to CCG by parameterizing the relationships between each constituent label and the preterminal categories directly to its left and right, biasing the model toward constituent categories that can combine with their contexts. This builds on the intuitions of Klein and Manning's (2002) "constituent-context" model, which demonstrated the value of modeling context, but has the advantage of being able to exploit the properties of CCG. Our experiments show that our model outperforms a baseline in which this context information is not captured.
ML ID: 320
Sequence to Sequence -- Video to Text
[Details] [PDF]
Subhashini Venugopalan and Marcus Rohrbach and Jeff Donahue and Raymond J. Mooney and Trevor Darrell and Kate Saenko
In Proceedings of the 2015 International Conference on Computer Vision (ICCV-15), Santiago, Chile, December 2015.
Real-world videos often have complex dynamics; and methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem, we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural networks, specifically LSTMs, which have demonstrated state-of-the-art performance in image caption generation. Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip. Our model naturally is able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. We evaluate several variants of our model that exploit different visual features on a standard set of YouTube videos and two movie description datasets (M-VAD and MPII-MD).
ML ID: 319
Stacked Ensembles of Information Extractors for Knowledge-Base Population
[Details] [PDF] [Slides (PPT)]
Vidhoon Viswanathan and Nazneen Fatema Rajani and Yinon Bentor and Raymond J. Mooney
In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-15), 177-187, Beijing, China, July 2015.
We present results on using stacking to ensemble multiple systems for the Knowledge Base Population English Slot Filling (KBP-ESF) task. In addition to using the output and confidence of each system as input to the stacked classifier, we also use features capturing how well the systems agree about the provenance of the information they extract. We demonstrate that our stacking approach outperforms the best system from the 2014 KBP-ESF competition as well as alternative ensembling methods employed in the 2014 KBP Slot Filler Validation task and several other ensembling baselines. Additionally, we demonstrate that including provenance information further increases the performance of stacking.
ML ID: 318
Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes
[Details] [PDF] [Poster]
Chris Quirk and Raymond Mooney and Michel Galley
In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-15), 878--888, Beijing, China, July 2015.
Using natural language to write programs is a touchstone problem for computational linguistics. We present an approach that learns to map natural-language descriptions of simple "if-then" rules to executable code. By training and testing on a large corpus of naturally-occurring programs (called "recipes") and their natural language descriptions, we demonstrate the ability to effectively map language to code. We compare a number of semantic parsing approaches on the highly noisy training data collected from ordinary users, and find that loosely synchronous systems perform best.
ML ID: 317
Knowledge Base Population using Stacked Ensembles of Information Extractors
[Details] [PDF]
Vidhoon Viswanathan
Masters Thesis, Department of Computer Science, The University of Texas at Austin, May 2015.
The performance of relation extractors plays a significant role in automatic creation of knowledge bases from web corpus. Using automated systems to create knowledge bases from web is known as Knowledge Base Population. Text Analysis Conference conducts English Slot Filling (ESF) and Slot Filler Validation (SFV) tasks as part of its KBP track to promote research in this area. Slot Filling systems are developed to do relation extraction for specific relation and entity types. Several participating universities have built Slot Filling systems addressing different aspects employing different algorithms and techniques for these tasks.
In this thesis, we investigate the use of ensemble learning to combine the output of existing individual Slot Filling systems. We are the first to employ Stacking, a type of ensemble learning algorithm for the task of ensembling Slot Filling systems for the KBP ESF and SFV tasks. Our approach builds an ensemble classi- fier that learns to meaningfully combine output from different Slot Filling systems and predict the correctness of extractions. Our experimental evaluation proves that Stacking is useful for ensembling SF systems. We demonstrate new state-of-the-art results for KBP ESF task. Our proposed system achieves an F1 score of 47.
Given the complexity of developing Slot Filling systems from scratch, our promising results indicate that performance on Slot Filling tasks can be increased by ensembling existing systems in shorter timeframe. Our work promotes research and investigation into other methods for ensembling Slot Filling systems.
ML ID: 315
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
[Details] [PDF] [Slides (PDF)]
Subhashini Venugopalan and Huijuan Xu and Jeff Donahue and Marcus Rohrbach and Raymond Mooney and Kate Saenko
In Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics -- Human Language Technologies (NAACL HLT 2015), 1494--1504, Denver, Colorado, June 2015.
Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images. In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure. Described video datasets are scarce, and most existing methods have been applied to toy domains with a small vocabulary of possible words. By transferring knowledge from 1.2M+ images with category labels and 100,000+ images with captions, our method is able to create sentence descriptions of open-domain videos with large vocabularies. We compare our approach with recent work using language generation metrics, subject, verb, and object prediction accuracy, and a human evaluation.
ML ID: 313
Unsupervised Code-Switching for Multilingual Historical Document Transcription
[Details] [PDF] [Slides (PDF)]
Dan Garrette and Hannah Alpert-Abrams and Taylor Berg-Kirkpatrick and Dan Klein
In Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics -- Human Language Technologies (NAACL HLT 2015), 1036--1041, Denver, Colorado, June 2015.
Transcribing documents from the printing press era, a challenge in its own right, is more complicated when documents interleave multiple languages—a common feature of 16th century texts. Additionally, many of these documents precede consistent orthographic conventions, making the task even harder. We extend the state-of-the-art historical OCR model of Berg-Kirkpatrick et al. (2013) to handle word-level code-switching between multiple languages. Further, we enable our system to handle spelling variability, including now-obsolete shorthand systems used by printers. Our results show average relative character error reductions of 14% across a variety of historical texts.
ML ID: 312
On the Proper Treatment of Quantifiers in Probabilistic Logic Semantics
[Details] [PDF] [Slides (PPT)]
I. Beltagy and Katrin Erk
In Proceedings of the 11th International Conference on Computational Semantics (IWCS-2015), London, UK, April 2015.
As a format for describing the meaning of natural language sentences, probabilistic logic combines the expressivity of first-order logic with the ability to handle graded information in a principled fashion. But practical probabilistic logic frameworks usually assume a finite domain in which each entity corresponds to a constant in the logic (domain closure assumption). They also assume a closed world where everything has a very low prior probability. These assumptions lead to some problems in the inferences that these systems make. In this paper, we show how to formulate Textual Entailment (RTE) inference problems in probabilistic logic in a way that takes the domain closure and closed-world assumptions into account. We evaluate our proposed technique on three RTE datasets, on a synthetic dataset with a focus on complex forms of quantification, on FraCas and on one more natural dataset. We show that our technique leads to improvements on the more natural dataset, and achieves 100% accuracy on the synthetic dataset and on the relevant part of FraCas.
ML ID: 311
Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning
[Details] [PDF] [Slides (PDF)]
Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith
In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), Austin, TX, January 2015.
Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that, in combination with a small universal set of rules, specify the syntactic configurations in which they may occur. Previous work has shown that learning sequence models for CCG tagging can be improved by using priors that are sensitive to the formal properties of CCG as well as cross-linguistic universals. We extend this approach to the task of learning a full CCG parser from weak supervision. We present a Bayesian formulation for CCG parser induction that assumes only supervision in the form of an incomplete tag dictionary mapping some word types to sets of potential categories. Our approach outperforms a baseline model trained with uniform priors by exploiting universal, intrinsic properties of the CCG formalism to bias the model toward simpler, more cross-linguistically common categories.
ML ID: 310
Natural Language Semantics using Probabilistic Logic
[Details] [PDF] [Slides (PPT)]
I. Beltagy
October 2014. PhD proposal, Department of Computer Science, The University of Texas at Austin.
With better natural language semantic representations, computers can do more applications more efficiently as a result of better understanding of natural text. However, no single semantic representation at this time fulfills all requirements needed for a satisfactory representation. Logic-based representations like first-order logic capture many of the linguistic phenomena using logical constructs, and they come with standardized inference mechanisms, but standard first-order logic fails to capture the ``graded'' aspect of meaning in languages. Distributional models use contextual similarity to predict the ``graded'' semantic similarity of words and phrases but they do not adequately capture logical structure. In addition, there are a few recent attempts to combine both representations either on the logic side (still, not a graded representation), or in the distribution side(not full logic).
We propose using probabilistic logic to represent natural language semantics combining the expressivity and the automated inference of logic, and the gradedness of distributional representations. We evaluate this semantic representation on two tasks, Recognizing Textual Entailment (RTE) and Semantic Textual Similarity (STS). Doing RTE and STS better is an indication of a better semantic understanding.
Our system has three main components, 1. Parsing and Task Representation, 2. Knowledge Base Construction, and 3. Inference The input natural sentences of the RTE/STS task are mapped to logical form using Boxer which is a rule based system built on top of a CCG parser, then they are used to formulate the RTE/STS problem in probabilistic logic. Then, a knowledge base is represented as weighted inference rules collected from different sources like WordNet and on-the-fly lexical rules from distributional semantics. An advantage of using probabilistic logic is that more rules can be added from more resources easily by mapping them to logical rules and weighting them appropriately. The last component is the inference, where we solve the probabilistic logic inference problem using an appropriate probabilistic logic tool like Markov Logic Network (MLN), or Probabilistic Soft Logic (PSL). We show how to solve the inference problems in MLNs efficiently for RTE using a modified closed-world assumption and a new inference algorithm, and how to adapt MLNs and PSL for STS by relaxing conjunctions. Experiments show that our semantic representation can handle RTE and STS reasonably well.
For the future work, our short-term goals are 1. better RTE task representation and finite domain handling, 2. adding more inference rules, precompiled and on-the-fly, 3. generalizing the modified closed-world assumption, 4. enhancing our inference algorithm for MLNs, and 5. adding a weight learning step to better adapt the weights. On the longer-term, we would like to apply our semantic representation to the question answering task, support generalized quantifiers, contextualize WordNet rules we use, apply our semantic representation to languages other than English, and implement a probabilistic logic Inference Inspector that can visualize the proof structure.
ML ID: 308
Weakly-Supervised Bayesian Learning of a CCG Supertagger
[Details] [PDF] [Slides (PDF)] [Poster]
Dan Garrette and Chris Dyer and Jason Baldridge and Noah A. Smith
In Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL-2014), 141--150, Baltimore, MD, June 2014.
We present a Bayesian formulation for weakly-supervised learning of a Combinatory Categorial Grammar (CCG) supertagger with an HMM. We assume supervision in the form of a tag dictionary, and our prior encourages the use of cross-linguistically common category structures as well as transitions between tags that can combine locally according to CCG's combinators. Our prior is theoretically appealing since it is motivated by language-independent, universal properties of the CCG formalism. Empirically, we show that it yields substantial improvements over previous work that used similar biases to initialize an EM-based learner. Additional gains are obtained by further shaping the prior with corpus-specific information that is extracted automatically from raw text and a tag dictionary.
ML ID: 307
Inclusive yet Selective: Supervised Distributional Hypernymy Detection
[Details] [PDF]
Stephen Roller and Katrin Erk and Gemma Boleda
In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), 1025--1036, Dublin, Ireland, August 2014.
We test the Distributional Inclusion Hypothesis, which states that hypernyms tend to occur in a superset of contexts in which their hyponyms are found. We find that this hypothesis only holds when it is applied to relevant dimensions. We propose a robust supervised approach that achieves accuracies of .84 and .85 on two existing datasets and that can be interpreted as selecting the dimensions that are relevant for distributional inclusion.
ML ID: 306
UTexas: Natural Language Semantics using Distributional Semantics and Probabilistic Logic
[Details] [PDF]
I. Beltagy and Stephen Roller and Gemma Boleda and and Katrin Erk and Raymond J. Mooney
In The 8th Workshop on Semantic Evaluation (SemEval-2014), 796--801, Dublin, Ireland, August 2014.
We represent natural language semantics by combining logical and distributional information in probabilistic logic. We use Markov Logic Networks (MLN) for the RTE task, and Probabilistic Soft Logic (PSL) for the STS task. The system is evaluated on the SICK dataset. Our best system achieves 73% accuracy on the RTE task, and a Pearson's correlation of 0.71 on the STS task.
ML ID: 305
Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild
[Details] [PDF]
Jesse Thomason and Subhashini Venugopalan and Sergio Guadarrama and Kate Saenko and Raymond Mooney
In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), 1218--1227, Dublin, Ireland, August 2014.
This paper integrates techniques in natural language processing and computer vision to improve recognition and description of entities and activities in real-world videos. We propose a strategy for generating textual descriptions of videos by using a factor graph to combine visual detections with language statistics. We use state-of-the-art visual recognition systems to obtain confidences on entities, activities, and scenes present in the video. Our factor graph model combines these detection confidences with probabilistic knowledge mined from text corpora to estimate the most likely subject, verb, object, and place. Results on YouTube videos show that our approach improves both the joint detection of these latent, diverse sentence components and the detection of some individual components when compared to using the vision system alone, as well as over a previous n-gram language-modeling approach. The joint detection allows us to automatically generate more accurate, richer sentential descriptions of videos with a wide array of possible content.
ML ID: 304
Efficient Markov Logic Inference for Natural Language Semantics
[Details] [PDF] [Poster]
I. Beltagy and Raymond J. Mooney
In Proceedings of the Fourth International Workshop on Statistical Relational AI at AAAI (StarAI-2014), 9--14, Quebec City, Canada, July 2014.
Using Markov logic to integrate logical and distributional information in natural-language semantics results in complex inference problems involving long, complicated formulae. Current inference methods for Markov logic are ineffective on such problems. To address this problem, we propose a new inference algorithm based on SampleSearch that computes probabilities of complete formulae rather than ground atoms. We also introduce a modified closed-world assumption that significantly reduces the size of the ground network, thereby making inference feasible. Our approach is evaluated on the recognizing textual entailment task, and experiments demonstrate its dramatic impact on the efficiency of inference.
ML ID: 303
Integrating Visual and Linguistic Information to Describe Properties of Objects
[Details] [PDF]
Calvin MacKenzie
2014. Undergraduate Honors Thesis, Computer Science Department, University of Texas at Austin.
Generating sentences from images has historically been performed with standalone Computer Vision systems. The idea of combining visual and linguistic information has been gaining traction in the Computer Vision and Natural Language Processing communities over the past several years. The motivation for a combined system is to generate richer linguistic descriptions of images. Standalone vision systems are typically unable to generate linguistically rich descriptions. This approach combines abundant available language data to clean up noisy results from standalone vision systems.
This thesis investigates the performance of several models which integrate information from language and vision systems in order to describe certain attributes of objects. The attributes used were split into two categories: color attributes and other attributes. Our proposed model was found to be statistically significantly more accurate than the vision system alone for both sets of attributes.
ML ID: 302
Semantic Parsing using Distributional Semantics and Probabilistic Logic
[Details] [PDF] [Poster]
I. Beltagy and Katrin Erk and Raymond Mooney
In Proceedings of ACL 2014 Workshop on Semantic Parsing (SP-2014), 7--11, Baltimore, MD, June 2014.
We propose a new approach to semantic parsing that is not constrained by a fixed formal ontology and purely logical inference. Instead, we use distributional semantics to generate only the relevant part of an on-the-fly ontology. Sentences and the on-the-fly ontology are represented in probabilistic logic. For inference, we use probabilistic logic frameworks like Markov Logic Networks (MLN) and Probabilistic Soft Logic (PSL). This semantic parsing approach is evaluated on two tasks, Textual Entitlement (RTE) and Textual Similarity (STS), both accomplished using inference in probabilistic logic. Experiments show the potential of the approach.
ML ID: 301
Probabilistic Soft Logic for Semantic Textual Similarity
[Details] [PDF] [Poster]
I. Beltagy and Katrin Erk and Raymond J. Mooney
In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14), 1210--1219, Baltimore, MD, 2014.
Probabilistic Soft Logic (PSL) is a recently developed framework for probabilistic logic. We use PSL to combine logical and distributional representations of natural-language meaning, where distributional information is represented in the form of weighted inference rules. We apply this framework to the task of Semantic Textual Similarity (STS) (i.e. judging the semantic similarity of natural-language sentences), and show that PSL gives improved results compared to a previous approach based on Markov Logic Networks (MLNs) and a purely distributional approach.
ML ID: 300
Statistical Script Learning with Multi-Argument Events
[Details] [PDF] [Poster]
Karl Pichotta and Raymond J. Mooney
In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), 220--229, Gothenburg, Sweden, April 2014.
Scripts represent knowledge of stereotypical event sequences that can aid text understanding. Initial statistical methods have been developed to learn probabilistic scripts from raw text corpora; however, they utilize a very impoverished representation of events, consisting of a verb and one dependent argument. We present a script learning approach that employs events with multiple arguments. Unlike previous work, we model the interactions between multiple entities in a script. Experiments on a large corpus using the task of inferring held-out events (the "narrative cloze evaluation") demonstrate that modeling multi-argument events improves predictive accuracy.
ML ID: 296
University of Texas at Austin KBP 2013 Slot Filling System: Bayesian Logic Programs for Textual Inference
[Details] [PDF]
Yinon Bentor and Amelia Harrison and Shruti Bhosale and Raymond Mooney
In Proceedings of the Sixth Text Analysis Conference (TAC 2013), 2013.
This document describes the University of Texas at Austin 2013 system for the Knowledge Base Population (KBP) English Slot Filling (SF) task. The UT Austin system builds upon the output of an existing relation extractor by augmenting relations that are explicitly stated in the text with ones that are inferred from the stated relations using probabilistic rules that encode commonsense world knowledge. Such rules are learned from linked open data and are encoded in the form of Bayesian Logic Programs (BLPs), a statistical relational learning framework based on directed graphical models. In this document, we describe our methods for learning these rules, estimating their associated weights, and performing probabilistic and logical inference to infer unseen relations. In the KBP SF task, our system was able to infer several unextracted relations, but its performance was limited by the base level extractor.
ML ID: 299
YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition
[Details] [PDF] [Poster]
Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, Kate Saenko
In Proceedings of the 14th International Conference on Computer Vision (ICCV-2013), 2712--2719, Sydney, Australia, December 2013.
Despite a recent push towards large-scale object recognition, activity recognition remains limited to narrow domains and small vocabularies of actions. In this paper, we tackle the challenge of recognizing and describing activities "in-the-wild". We present a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action, and its object. Unlike previous work, our approach works on out-of-domain actions: it does not require training videos of the exact activity. If it cannot find an accurate prediction for a pre-trained model, it finds a less specific answer that is also plausible from a pragmatic standpoint. We use semantic hierarchies learned from the data to help to choose an appropriate level of generalization, and priors learned from web-scale natural language corpora to penalize unlikely combinations of actors/actions/objects; we also use a web-scale language model to "fill in" novel verbs, i.e. when the verb does not appear in the training set. We evaluate our method on a large YouTube corpus and demonstrate it is able to generate short sentence descriptions of video clips better than baseline approaches.
ML ID: 295
A Multimodal LDA Model Integrating Textual, Cognitive and Visual Modalities
[Details] [PDF]
Stephen Roller and Sabine Schulte im Walde
In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 1146--1157, Seattle, WA, October 2013.
Recent investigations into grounded models of language have shown that holistic views of language and perception can provide higher performance than independent views. In this work, we improve a two-dimensional multimodal version of Latent Dirichlet Allocation (Andrews et al., 2009) in various ways. (1) We outperform text-only models in two different evaluations, and demonstrate that low-level visual features are directly compatible with the existing model. (2) We present a novel way to integrate visual features into the LDA model using unsupervised clusters of images. The clusters are directly interpretable and improve on our evaluation tasks. (3) We provide two novel ways to extend the bimodal models to support three or more modalities. We find that the three-, four-, and five-dimensional models significantly outperform models using only one or two modalities, and that nontextual modalities each provide separate, disjoint knowledge that cannot be forced into a shared, latent structure.
ML ID: 294
Identifying Phrasal Verbs Using Many Bilingual Corpora
[Details] [PDF] [Poster]
Karl Pichotta and John DeNero
In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 636--646, Seattle, WA, October 2013.
We address the problem of identifying multiword expressions in a language, focusing on English phrasal verbs. Our polyglot ranking approach integrates frequency statistics from translated corpora in 50 different languages. Our experimental evaluation demonstrates that combining statistical evidence from many parallel corpora using a novel ranking-oriented boosting algorithm produces a comprehensive set of English phrasal verbs, achieving performance comparable to a human-curated set.
ML ID: 293
Detecting Promotional Content in Wikipedia
[Details] [PDF] [Slides (PPT)]
Shruti Bhosale and Heath Vinicombe and Raymond J. Mooney
In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 1851--1857, Seattle, WA, October 2013.
This paper presents an approach for detecting promotional content in Wikipedia. By incorporating stylometric features, including features based on n-gram and PCFG language models, we demonstrate improved accuracy at identifying promotional articles, compared to using only lexical information and meta-features.
ML ID: 292
Grounded Language Learning Models for Ambiguous Supervision
[Details] [PDF] [Slides (PPT)]
Joo Hyun Kim
PhD Thesis, Department of Computer Science, University of Texas at Austin, December 2013.
Communicating with natural language interfaces is a long-standing, ultimate goal for artificial intelligence (AI) agents to pursue, eventually. One core issue toward this goal is "grounded" language learning, a process of learning the semantics of natural language with respect to relevant perceptual inputs. In order to ground the meanings of language in a real world situation, computational systems are trained with data in the form of natural language sentences paired with relevant but ambiguous perceptual contexts. With such ambiguous supervision, it is required to resolve the ambiguity between a natural language (NL) sentence and a corresponding set of possible logical meaning representations (MR).
In this thesis, we focus on devising effective models for simultaneously disambiguating such supervision and learning the underlying semantics of language to map NL sentences into proper logical MRs. We present probabilistic generative models for learning such correspondences along with a reranking model to improve the performance further.
First, we present a probabilistic generative model that learns the mappings from NL sentences into logical forms where the true meaning of each NL sentence is one of a handful of candidate logical MRs. It simultaneously disambiguates the meaning of each sentence in the training data and learns to probabilistically map an NL sentence to its corresponding MR form depicted in a single tree structure. We perform evaluations on the RoboCup sportscasting corpus, proving that our model is more effective than those proposed by previous researchers.
Next, we describe two PCFG induction models for grounded language learning that extend the previous grounded language learning model of Borschinger, Jones, and Johnson (2011). Borschinger et al.'s approach works well in situations of limited ambiguity, such as in the sportscasting task. However, it does not scale well to highly ambiguous situations when there are large sets of potential meaning possibilities for each sentence, such as in the navigation instruction following task first studied by Chen and Mooney (2011). The two models we present overcome such limitations by employing a learned semantic lexicon as a basic correspondence unit between NL and MR for PCFG rule generation.
Finally, we present a method of adapting discriminative reranking to grounded language learning in order to improve the performance of our proposed generative models. Although such generative models are easy to implement and are intuitive, it is not always the case that generative models perform best, since they are maximizing the joint probability of data and model, rather than directly maximizing conditional probability. Because we do not have gold-standard references for training a secondary conditional reranker, we incorporate weak supervision of evaluations against the perceptual world during the process of improving model performance.
All these approaches are evaluated on the two publicly available domains that have been actively used in many other grounded language learning studies. Our methods demonstrate consistently improved performance over those of previous studies in the domains with different languages; this proves that our methods are language-independent and can be generally applied to other grounded learning problems as well. Further possible applications of the presented approaches include summarized machine translation tasks and learning from real perception data assisted by computer vision and robotics.
ML ID: 291
Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages
[Details] [PDF]
Dan Garrette and Jason Mielens and Jason Baldridge
In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), 583--592, Sofia, Bulgaria, August 2013.
Developing natural language processing tools for low-resource languages often requires creating resources from scratch. While a variety of semi-supervised methods exist for training from incomplete data, there are open questions regarding what types of training data should be used and how much is necessary. We discuss a series of experiments designed to shed light on such questions in the context of part-of-speech tagging. We obtain timed annotations from linguists for the low-resource languages Kinyarwanda and Malagasy (as well as English) and evaluate how the amounts of various kinds of data affect performance of a trained POS-tagger. Our results show that annotation of word types is the most important, provided a sufficiently capable semi-supervised learning infrastructure is in place to project type information onto a raw corpus. We also show that finite-state morphological analyzers are effective sources of type information when few labeled examples are available.
ML ID: 288
Online Inference-Rule Learning from Natural-Language Extractions
[Details] [PDF] [Poster]
Sindhu Raghavan and Raymond J. Mooney
In Proceedings of the 3rd Statistical Relational AI (StaRAI-13) workshop at AAAI '13, July 2013.
In this paper, we consider the problem of learning commonsense knowledge in the form of first-order rules from incomplete and noisy natural-language extractions produced by an off-the-shelf information extraction (IE) system. Much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. The proposed rule learner accounts for this phenomenon by learning rules in which the body of the rule contains relations that are usually explicitly stated, while the head employs a less-frequently mentioned relation that is easily inferred. The rule learner processes training examples in an online manner to allow it to scale to large text corpora. Furthermore, we propose a novel approach to weighting rules using a curated lexical ontology like WordNet. The learned rules along with their parameters are then used to infer implicit information using a Bayesian Logic Program. Experimental evaluation on a machine reading testbed demonstrates the efficacy of the proposed methods.
ML ID: 287
Adapting Discriminative Reranking to Grounded Language Learning
[Details] [PDF] [Slides (PPT)]
Joohyun Kim and Raymond J. Mooney
In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), 218--227, Sofia, Bulgaria, August 2013.
We adapt discriminative reranking to improve the performance of grounded language acquisition, specifically the task of learning to follow navigation instructions from observation. Unlike conventional reranking used in syntactic and semantic parsing, gold-standard reference trees are not naturally available in a grounded setting. Instead, we show how the weak supervision of response feedback (e.g. successful task completion) can be used as an alternative, experimentally demonstrating that its performance is comparable to training on gold-standard parse trees.
ML ID: 286
Montague Meets Markov: Deep Semantics with Probabilistic Logical Form
[Details] [PDF] [Slides (PPT)]
I. Beltagy, Cuong Chau, Gemma Boleda, Dan Garrette, Katrin Erk, Raymond Mooney
In Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*Sem-2013), 11--21, Atlanta, GA, June 2013.
We combine logical and distributional representations of natural language meaning by transforming distributional similarity judgments into weighted inference rules using Markov Logic Networks (MLNs). We show that this framework supports both judging sentence similarity and recognizing textual entailment by appropriately adapting the MLN implementation of logical connectives. We also show that distributional phrase similarity, used as textual inference rules created on the fly, improves its performance.
ML ID: 285
A Formal Approach to Linking Logical Form and Vector-Space Lexical Semantics
[Details] [PDF]
Dan Garrette, Katrin Erk, Raymond J. Mooney
In Harry Bunt, Johan Bos, and Stephen Pulman, editors, Computing Meaning, 27--48, Berlin, 2013. Springer.
First-order logic provides a powerful and flexible mechanism for representing natural language semantics. However, it is an open question of how best to integrate it with uncertain, weighted knowledge, for example regarding word meaning. This paper describes a mapping between predicates of logical form and points in a vector space. This mapping is then used to project distributional inferences to inference rules in logical form. We then describe first steps of an approach that uses this mapping to recast first-order semantics into the probabilistic models that are part of Statistical Relational AI. Specifically, we show how Discourse Representation Structures can be combined with distributional models for word meaning inside a Markov Logic Network and used to successfully perform inferences that take advantage of logical concepts such as negation and factivity as well as weighted information on word meaning in context.
ML ID: 284
Learning a Part-of-Speech Tagger from Two Hours of Annotation
[Details] [PDF] [Slides (PDF)] [Video]
Dan Garrette, Jason Baldridge
In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-13), 138--147, Atlanta, GA, June 2013.
Most work on weakly-supervised learning for part-of-speech taggers has been based on unrealistic assumptions about the amount and quality of training data. For this paper, we attempt to create true low-resource scenarios by allowing a linguist just two hours to annotate data and evaluating on the languages Kinyarwanda and Malagasy. Given these severely limited amounts of either type supervision (tag dictionaries) or token supervision (labeled sentences), we are able to dramatically improve the learning of a hidden Markov model through our method of automatically generalizing the annotations, reducing noise, and inducing word-tag frequency information.
ML ID: 283
Generating Natural-Language Video Descriptions Using Text-Mined Knowledge
[Details] [PDF] [Slides (PPT)]
Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, Sergio Guadarrama
In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI-2013), 541--547, July 2013.
We present a holistic data-driven technique that generates natural-language descriptions for videos. We combine the output of state-of-the-art object and activity detectors with "real-world" knowledge to select the most probable subject-verb-object triplet for describing a video. We show that this knowledge, automatically mined from web-scale text corpora, enhances the triplet selection algorithm by providing it contextual information and leads to a four-fold increase in activity identification. Unlike previous methods, our approach can annotate arbitrary videos without requiring the expensive collection and annotation of a similar training video corpus. We evaluate our technique against a baseline that does not use text-mined knowledge and show that humans prefer our descriptions 61 percent of the time.
ML ID: 282
Latent Variable Models of Distributional Lexical Semantics
[Details] [PDF]
Joseph Reisinger
PhD Thesis, Department of Computer Science, University of Texas at Austin, May 2012.
In order to respond to increasing demand for natural language interfaces—and provide meaningful insight into user query intent—fast, scalable lexical semantic models with flexible representations are needed. Human concept organization is a rich phenomenon that has yet to be accounted for by a single coherent psychological framework: Concept generalization is captured by a mixture of prototype and exemplar models, and local taxonomic information is available through multiple overlapping organizational systems. Previous work in computational linguistics on extracting lexical semantic information from unannotated corpora does not provide adequate representational flexibility and hence fails to capture the full extent of human conceptual knowledge. In this thesis I outline a family of probabilistic models capable of capturing important aspects of the rich organizational structure found in human language that can predict contextual variation, selectional preference and feature-saliency norms to a much higher degree of accuracy than previous approaches. These models account for cross-cutting structure of concept organization—i.e. selective attention, or the notion that humans make use of different categorization systems for different kinds of generalization tasks—and can be applied to Web-scale corpora. Using these models, natural language systems will be able to infer a more comprehensive semantic relations, which in turn may yield improved systems for question answering, text classification, machine translation, and information retrieval.
ML ID: 309
Bayesian Logic Programs for Plan Recognition and Machine Reading
[Details] [PDF] [Slides (PPT)]
Sindhu Raghavan
PhD Thesis, Department of Computer Science, University of Texas at Austin, December 2012. 170.
Several real world tasks involve data that is uncertain and relational in nature. Traditional approaches like first-order logic and probabilistic models either deal with structured data or uncertainty, but not both. To address these limitations, statistical relational learning (SRL), a new area in machine learning integrating both first-order logic and probabilistic graphical models, has emerged in the recent past. The advantage of SRL models is that they can handle both uncertainty and structured/relational data. As a result, they are widely used in domains like social network analysis, biological data analysis, and natural language processing. Bayesian Logic Programs (BLPs), which integrate both first-order logic and Bayesian networks are a powerful SRL formalism developed in the recent past. In this dissertation, we develop approaches using BLPs to solve two real world tasks -- plan recognition and machine reading.
Plan recognition is the task of predicting an agent's top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring cause from effect. In the first part of the dissertation, we develop an approach to abductive plan recognition using BLPs. Since BLPs employ logical deduction to construct the networks, they cannot be used effectively for abductive plan recognition as is. Therefore, we extend BLPs to use logical abduction to construct Bayesian networks and call the resulting model Bayesian Abductive Logic Programs (BALPs).
In the second part of the dissertation, we apply BLPs to the task of machine reading, which involves automatic extraction of knowledge from natural language text. Most information extraction (IE) systems identify facts that are explicitly stated in text. However, much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. Human readers naturally use common sense knowledge and "read between the lines" to infer such implicit information from the explicitly stated facts. Since IE systems do not have access to common sense knowledge, they cannot perform deeper reasoning to infer implicitly stated facts. Here, we first develop an approach using BLPs to infer implicitly stated facts from natural language text. It involves learning uncertain common sense knowledge in the form of probabilistic first-order rules by mining a large corpus of automatically extracted facts using an existing rule learner. These rules are then used to derive additional facts from extracted information using BLP inference. We then develop an online rule learner that handles the concise, incomplete nature of natural-language text and learns first-order rules from noisy IE extractions. Finally, we develop a novel approach to calculate the weights of the rules using a curated lexical ontology like WordNet.
Both tasks described above involve inference and learning from partially observed or incomplete data. In plan recognition, the underlying cause or the top-level plan that resulted in the observed actions is not known or observed. Further, only a subset of the executed actions can be observed by the plan recognition system resulting in partially observed data. Similarly, in machine reading, since some information is implicitly stated, they are rarely observed in the data. In this dissertation, we demonstrate the efficacy of BLPs for inference and learning from incomplete data. Experimental comparison on various benchmark data sets on both tasks demonstrate the superior performance of BLPs over state-of-the-art methods.
ML ID: 280
Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
[Details] [PDF]
Dan Garrette and Jason Baldridge
In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), 821--831, Jeju, Korea, July 2012.
Past work on learning part-of-speech taggers from tag dictionaries and raw data has reported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume access to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from incomplete tag dictionaries. Taking the MIN-GREEDY algorithm (Ravi et al., 2010) as a starting point, we improve it with several intuitive heuristics. We also define a simple HMM emission initialization that takes advantage of the tag dictionary and raw data to capture both the openness of a given tag and its estimated prevalence in the raw data. Altogether, our augmentations produce improvements to performance over the original MIN-GREEDY algorithm for both English and Italian data.
ML ID: 279
Improving Video Activity Recognition using Object Recognition and Text Mining
[Details] [PDF] [Slides (PPT)]
Tanvi S. Motwani and Raymond J. Mooney
In Proceedings of the 20th European Conference on Artificial Intelligence (ECAI-2012), 600--605, August 2012.
Recognizing activities in real-world videos is a challenging AI problem. We present a novel combination of standard activity classification, object recognition, and text mining to learn effective activity recognizers without ever explicitly labeling training videos. We cluster verbs used to describe videos to automatically discover classes of activities and produce a labeled training set. This labeled data is then used to train an activity classifier based on spatio-temporal features. Next, text mining is employed to learn the correlations between these verbs and related objects. This knowledge is then used together with the outputs of an off-the-shelf object recognizer and the trained activity classifier to produce an improved activity recognizer. Experiments on a corpus of YouTube videos demonstrate the effectiveness of the overall approach.
ML ID: 274
Generative Models of Grounded Language Learning with Ambiguous Supervision
[Details] [PDF] [Slides (PPT)]
Joohyun Kim
Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin, June 2012.
"Grounded" language learning is the process of learning the semantics of natural language with respect to relevant perceptual inputs. Toward this goal, computational systems are trained with data in the form of natural language sentences paired with relevant but ambiguous perceptual contexts. With such ambiguous supervision, it is required to resolve the ambiguity between a natural language (NL) sentence and a corresponding set of possible logical meaning representations (MR). My research focuses on devising effective models for simultaneously disambiguating such supervision and learning the underlying semantics of language to map NL sentences into proper logical forms. Specifically, I will present two probabilistic generative models for learning such correspondences. The models are applied to two publicly available datasets in two different domains, sportscasting and navigation, and compared with previous work on the same data.
I will first present a probabilistic generative model that learns the mappings from NL sentences into logical forms where the true meaning of each NL sentence is one of a handful of candidate logical MRs. It simultaneously disambiguates the meaning of each sentence in the training data and learns to probabilistically map a NL sentence to its MR form depicted in a single tree structure. Evaluations are performed on the RoboCup sportscasting corpous, which show that it outperforms previous methods.
Next, I present a PCFG induction model for grounded language learning that extends the model of Borschinger, Jones, and Johnson (2011) by utilizing a semantic lexicon. Borschinger et al.'s approach works well when there is limited ambiguity such as in the sportscasting task, but it does not scale well to highly ambiguous situations when there are large sets of potential meaning possibilities for each sentence, such as in the navigation instruction following task studied by Chen and Mooney (2011). Our model overcomes such limitations by employing a semantic lexicon as the basic building block for PCFG rule generation. Our model also allows for novel combination of MR outputs when parsing novel test sentences.
For future work, I propose to extend our PCFG induction model in several ways: improving the lexicon learning algorithm, discriminative re-ranking of top-k parses, and integrating the meaning representation language (MRL) grammar for extra structural information. The longer-term agenda includes applying our approach to summarized machine translation, using real perception data such as robot sensorimeter and images/videos, and joint learning with other natural language processing tasks.
ML ID: 273
Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision
[Details] [PDF]
Joohyun Kim and Raymond J. Mooney
In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL '12), 433--444, Jeju Island, Korea, July 2012.
"Grounded" language learning employs training data in the form of sentences paired with relevant but ambiguous perceptual contexts. Borschinger et al. (2011) introduced an approach to grounded language learning based on unsupervised PCFG induction. Their approach works well when each sentence potentially refers to one of a small set of possible meanings, such as in the sportscasting task. However, it does not scale to problems with a large set of potential meanings for each sentence, such as the navigation instruction following task studied by Chen and Mooney (2011). This paper presents an enhancement of the PCFG approach that scales to such problems with highly-ambiguous supervision. Experimental results on the navigation task demonstrates the effectiveness of our approach.
ML ID: 272
Fast Online Lexicon Learning for Grounded Language Acquisition
[Details] [PDF] [Slides (PPT)]
David L. Chen
In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL-2012), 430--439, July 2012.
Learning a semantic lexicon is often an important first step in building a system that learns to interpret the meaning of natural language. It is especially important in language grounding where the training data usually consist of language paired with an ambiguous perceptual context. Recent work by Chen and Mooney (2011) introduced a lexicon learning method that deals with ambiguous relational data by taking intersections of graphs. While the algorithm produced good lexicons for the task of learning to interpret navigation instructions, it only works in batch settings and does not scale well to large datasets. In this paper we introduce a new online algorithm that is an order of magnitude faster and surpasses the state-of-the-art results. We show that by changing the grammar of the formal meaning representation language and training on additional data collected from Amazon's Mechanical Turk we can further improve the results. We also include experimental results on a Chinese translation of the training data to demonstrate the generality of our approach.
ML ID: 271
Learning Language from Ambiguous Perceptual Context
[Details] [PDF] [Slides (PPT)]
David L. Chen
PhD Thesis, Department of Computer Science, University of Texas at Austin, May 2012. 196.
Building a computer system that can understand human languages has been one of the long-standing goals of artificial intelligence. Currently, most state-of-the-art natural language processing (NLP) systems use statistical machine learning methods to extract linguistic knowledge from large, annotated corpora. However, constructing such corpora can be expensive and time-consuming due to the expertise it requires to annotate such data. In this thesis, we explore alternative ways of learning which do not rely on direct human supervision. In particular, we draw our inspirations from the fact that humans are able to learn language through exposure to linguistic inputs in the context of a rich, relevant, perceptual environment.
We first present a system that learned to sportscast for RoboCup simulation games by observing how humans commentate a game. Using the simple assumption that people generally talk about events that have just occurred, we pair each textual comment with a set of events that it could be referring to. By applying an EM-like algorithm, the system simultaneously learns a grounded language model and aligns each description to the corresponding event. The system does not use any prior language knowledge and was able to learn to sportscast in both English and Korean. Human evaluations of the generated commentaries indicate they are of reasonable quality and in some cases even on par with those produced by humans.
For the sportscasting task, while each comment could be aligned to one of several events, the level of ambiguity was low enough that we could enumerate all the possible alignments. However, it is not always possible to restrict the set of possible alignments to such limited numbers. Thus, we present another system that allows each sentence to be aligned to one of exponentially many connected subgraphs without explicitly enumerating them. The system first learns a lexicon and uses it to prune the nodes in the graph that are unrelated to the words in the sentence. By only observing how humans follow navigation instructions, the system was able to infer the corresponding hidden navigation plans and parse previously unseen instructions in new environments for both English and Chinese data. With the rise in popularity of crowdsourcing, we also present results on collecting additional training data using Amazon’s Mechanical Turk. Since our system only needs supervision in the form of language being used in relevant contexts, it is easy for virtually anyone to contribute to the training data.
ML ID: 269
Building a Persistent Workforce on Mechanical Turk for Multilingual Data Collection
[Details] [PDF] [Slides (PPT)]
David L. Chen and William B. Dolan
In Proceedings of The 3rd Human Computation Workshop (HCOMP 2011), August 2011.
Traditional methods of collecting translation and paraphrase data are prohibitively expensive, making the construction of large, new corpora difficult. While crowdsourcing offers a cheap alternative, quality control and scalability can become problematic. We discuss a novel annotation task that uses videos as the stimulus which discourages cheating. In addi- tion, our approach requires only monolingual speakers, thus making it easier to scale since more workers are qualified to contribute. Finally, we employ a multi-tiered payment system that helps retain good workers over the long-term, resulting in a persistent, high-quality workforce. We present the results of one of the largest linguistic data collection efforts to date using Mechanical Turk, yielding 85K English sentences and more than 1k sentences for each of a dozen more languages.
ML ID: 265
Panning for Gold: Finding Relevant Semantic Content for Grounded Language Learning
[Details] [PDF] [Slides (PDF)]
David L. Chen and Raymond J. Mooney
In Proceedings of Symposium on Machine Learning in Speech and Language Processing (MLSLP 2011), June 2011.
One of the key challenges in grounded language acquisition is resolving the intentions of the expressions. Typically the task involves identifying a subset of records from a list of candidates as the correct meaning of a sentence. While most current work assume complete or partial independence be- tween the records, we examine a scenario in which they are strongly related. By representing the set of potential meanings as a graph, we explicitly encode the relationships between the candidate meanings. We introduce a refinement algorithm that first learns a lexicon which is then used to remove parts of the graphs that are irrelevant. Experiments in a navigation domain shows that the algorithm successfully recovered over three quarters of the correct semantic content.
ML ID: 261
Fine-Grained Class Label Markup of Search Queries
[Details] [PDF]
Joseph Reisinger and Marius Pasca
In Proceedings of The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), 1200-1209, June 2011.
We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difﬁcult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.
ML ID: 260
Collecting Highly Parallel Data for Paraphrase Evaluation
[Details] [PDF] [Slides (PPT)]
David L. Chen and William B. Dolan
In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 190-200, Portland, Oregon, USA, June 2011.
A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing from making the kind of rapid progress enjoyed by the machine translation community over the last 15 years. We address both problems by presenting a novel data collection framework that produces highly parallel text data relatively inexpensively and on a large scale. The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lexical dissimilarity of paraphrase candidates. In addition to being simple and efficient to compute, experiments show that these metrics correlate highly with human judgments.
ML ID: 259
Integrating Logical Representations with Probabilistic Information using Markov Logic
[Details] [PDF] [Slides (PDF)]
Dan Garrette, Katrin Erk, Raymond Mooney
In Proceedings of the International Conference on Computational Semantics, 105--114, Oxford, England, January 2011.
First-order logic provides a powerful and flexible mechanism for representing natural language semantics. However, it is an open question of how best to integrate it with uncertain, probabilistic knowledge, for example regarding word meaning. This paper describes the first steps of an approach to recasting first-order semantics into the probabilistic models that are part of Statistical Relational AI. Specifically, we show how Discourse Representation Structures can be combined with distributional models for word meaning inside a Markov Logic Network and used to successfully perform inferences that take advantage of logical concepts such as factivity as well as probabilistic information on word meaning in context.
ML ID: 253
Learning to Predict Readability using Diverse Linguistic Features
[Details] [PDF] [Slides (PPT)]
Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J. Mooney, Salim Roukos and Chris Welty
In 23rd International Conference on Computational Linguistics (COLING 2010), 2010.
In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a mix of genres show that the predictions of the learned system are more accurate than the predictions of naive human judges when compared against the predictions of linguistically-trained expert human judges. The experiments also compare the performances of different learning algorithms and different types of feature sets when used for predicting readability
ML ID: 250
Authorship Attribution Using Probabilistic Context-Free Grammars
[Details] [PDF] [Slides (PPT)]
Sindhu Raghavan, Adriana Kovashka and Raymond Mooney
In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-2010), 38--42, 2010.
In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach involves building a probabilistic context-free grammar for each author and using this grammar as a language model for classification. We evaluate the performance of our method on a wide range of datasets to demonstrate its efficacy.
ML ID: 243
Associative Anaphora Resolution: A Web-Based Approach
[Details] [PDF]
Razvan Bunescu
In Proceedings of the EACL-2003 Workshop on the Computational Treatment of Anaphora, 47-52, Budapest, Hungary, 2003.
We present a novel approach to solving definite descriptions in unrestricted text based on searching the web for a particular type of lexicosyntactic patterns. Using statistics on these patterns, we intend to recover the antecedents for a predefined subset of definite descriptions occurring in two types of anaphoric relations: identity anaphora and associative anaphora. Preliminary results obtained with this method are promising and compare well with other methods.
ML ID: 120
Machine Learning
[Details] [PDF]
Raymond J. Mooney
New York, NY, 2003. McGraw-Hill.
This chapter introduces symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word-sense disambiguation and anaphora resolution. Applications to a variety of these problems are reviewed.
ML ID: 119
Learning Parse and Translation Decisions From Examples With Rich Context
[Details] [PDF]
Ulf Hermjakob and Raymond J. Mooney
In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL'97/EACL'97), 482-489, July 1997.
This paper presents a knowledge and context-based system for parsing and translating natural language and evaluates it on sentences from the Wall Street Journal. Applying machine learning techniques, the system uses parse action examples acquired under supervision to generate a deterministic shift-reduce parser in the form of a decision structure. It relies heavily on context, as encoded in features which describe the morpholgical, syntactical, semantical and other aspects of a given parse state.
ML ID: 73
Learning Parse and Translation Decisions From Examples With Rich Context
[Details] [PDF]
Ulf Hermjakob
PhD Thesis, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, May 1997. 175 pages. Technical Report UT-AI97-261.
The parsing of unrestricted text, with its enormous lexical and structural ambiguity, still poses a great challenge in natural language processing. The difficulties with traditional approaches, which try to master the complexity of parse grammars with hand-crafted rules, have led to a trend towards more empirical techniques.
We therefore propose a system for parsing and translating natural language that learns from examples and uses some background knowledge.
As our parsing model we choose a deterministic shift-reduce type parser that integrates part-of-speech tagging and syntactic and semantic processing, which not only makes parsing very efficient, but also assures transparency during the supervised example acquisition.
Applying machine learning techniques, the system uses parse action examples to generate a parser in the form of a decision structure, a generalization of decision trees.
To learn good parsing and translation decisions, our system relies heavily on context, as encoded in currently 205 features describing the morphological, syntactical and semantical aspects of a given parse state. Compared with recent probabilistic systems that were trained on 40,000 sentences, our system relies on more background knowledge and a deeper analysis, but radically fewer examples, currently 256 sentences.
We test our parser on lexically limited sentences from the Wall Street Journal and achieve accuracy rates of 89.8% for labeled precision, 98.4% for part of speech tagging and 56.3% of test sentences without any crossing brackets. Machine translations of 32 Wall Street Journal sentences to German have been evaluated by 10 bilingual volunteers and been graded as 2.4 on a 1.0 (best) to 6.0 (worst) scale for both grammatical correctness and meaning preservation. The translation quality was only minimally better (2.2) when starting each translation with the correct parse tree, which indicates that the parser is quite robust and that its errors have only a moderate impact on final trans- lation quality. These parsing and translation results already compare well with other systems and, given the relatively small training set and amount of overall knowledge used so far, the results suggest that our system Contex can break previous accuracy ceilings when scaled up further.
ML ID: 72
Comparative Results on Using Inductive Logic Programming for Corpus-based Parser Construction
[Details] [PDF]
John M. Zelle and Raymond J. Mooney
In Stefan Wermter and Ellen Riloff and Gabriela Scheler, editors, Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, 355-369, Berlin, 1996. Springer.
This paper presents results from recent experiments with CHILL, a corpus-based parser acquisition system. CHILL treats language acquisition as the learning of search-control rules within a logic program. Unlike many current corpus-based approaches that use statistical learning algorithms, CHILL uses techniques from inductive logic programming (ILP) to learn relational representations. CHILL is a very flexible system and has been used to learn parsers that produce syntactic parse trees, case-role analyses, and executable database queries. The reported experiments compare CHILL's performance to that of a more naive application of ILP to parser acquisition. The results show that ILP techniques, as employed in CHILL, are a viable alternative to statistical methods and that the control-rule framework is fundamental to CHILL's success.
ML ID: 54
Learning the Past Tense of English Verbs Using Inductive Logic Programming
[Details] [PDF]
Raymond J. Mooney and Mary Elaine Califf
In {S. Wermter, E. Riloff} and G. Scheler, editors, Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, 370-384, Berlin, 1996. Springer.
This paper presents results on using a new inductive logic programming method called FOIDL to learn the past tense of English verbs. The past tense task has been widely studied in the context of the symbolic/connectionist debate. Previous papers have presented results using various neural-network and decision-tree learning methods. We have developed a technique for learning a special type of Prolog program called a first-order decision list, defined as an ordered list of clauses each ending in a cut. FOIDL is based on FOIL (Quinlan, 1990) but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as the past-tense task. We present results showing that FOIDL learns a more accurate past-tense generator from significantly fewer examples than all other previous methods.
ML ID: 53
A General Explanation-Based Learning Mechanism and its Application to Narrative Understanding
[Details] [PDF]
Raymond J. Mooney
Ph.D. thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1988
Explanation-based learning (EBL) is a learning method which uses existing knowledge of the domain to construct an explanation for why a specific example is a member of a concept or why a specific combination of actions achieves a goal. This explanation is then generalized in an analytical manner in order to produce a general concept description or plan schema. Although a number of exploratory EBL systems which operate in particular domains have previously been constructed, recent research in this area has lead to the development of general mechanisms which can perform explanation-based learning in a wide variety of domains.
This thesis describes a general EBL mechanism, EGGS, which can make use of declarative knowledge stored in the form of Horn clauses, rewrite rules, or STRIPS operators. Numerous examples are presented illustrating its application to a wide variety of domains, including "blocks world" planning, logic circuit design, artifact recognition, and various forms of mathematical problem solving. The system is shown to improve its performance in each of these domains.
EGGS has been most thoroughly tested as a component of a narrative understanding system, GENESIS, which improves its own performance through learning. GENESIS processes short English narratives and constructs explanations for characters' intentional behavior. When the system detects that a character has achieved an important goal by combining actions in an unfamiliar way, EGGS is used to generalize the specific explanation for how the goal was achieved into a general plan schema. The resulting schema is then retained by the system and indexed into its existing knowledge-base. This schema can then be used to process narratives which were previously beyond the system's capabilities. The thesis also discusses GENESIS' ability to learn meanings for words related to its learned schemata and reviews several recent psychological experiments which demonstrate that GENESIS can be productively interpreted as a cognitive model of certain types of human learning.
Generalizing Explanations of Narratives into Schemata
[Details] [PDF]
Raymond J. Mooney
In Proceedings of the Third International Machine Learning Workshop, 126--128, New Brunswick, New Jersey, 1985.
This paper describes a natural language system which improves its performance through learning. The system processes short English narratives and from a single narrative acquires a new schema for a stereotypical set of actions. During the understanding process, the system constructs explanations for characters' actions in terms of the goals they were meant to achieve. If a character achieves a common goal in a novel way, it generalizes the set of actions used to achieve this goal into a new schema. The generalization process is a knowledge-based analysis of the narrative's causal structure which removes unnecessary details while maintaining the validity of the explanation. The resulting generalized set of actions is then stored as a new schema and used by the system to process narratives which were previously beyond its capabilities.
ML ID: 276
Learning Schemata for Natural Language Processing
[Details] [PDF]
Raymond J. Mooney and Gerald F. DeJong
In Proceedings of the Ninth International Joint Conference on Artificial Intelligence (IJCAI-85), 681-687, Los Angeles, CA, August 1985.
This paper describes a natural language system which improves its own performance through learning. The system processes short English narratives and is able to acquire, from a single narrative, a new schema for a stereotypical set of actions. During the understanding process, the system attempts to construct explanations for characters' actions in terms of the goals their actions were meant to achieve. When the system observes that a character has achieved an interesting goal in a novel way, it generalizes the set of actions they used to achieve this goal into a new schema. The generalization process is a knowledge-based analysis of the causal structure of the narrative which removes unnecessary details while maintaining the validity of the causal explanation. The resulting generalized set of actions is then stored as a new schema and used by the system to correctly process narratives which were previously beyond its capabilities.
ML ID: 205
Generalizing Explanations of Narratives into Schemata
[Details] [PDF]
Raymond J. Mooney
Masters Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1985.
This thesis describes a natural language system called GENESIS which improves its own performance through learning. The system processes short English narratives and is able to acquire, from a single narrative, a new schema for a stereotypical set of actions. During the understanding process, the system attempts to construct explanations for characters' actions in terms of the goals their actions were meant to achieve. When the system observes that a character in a narrative has achieved an interesting goal in a novel way, it generalizes the set of actions they used to achieve this goal into a new schema. The generalization process is a knowledge-based analysis of the causal structure of the narrative which removes unnecessary details while maintaining the validity of the causal explanation. The resulting generalized combination of actions is then stored as a new schema in the system's knowledge base. This new schema can then be used by the system to correctly process narratives which were previously beyond its capabilities.