With better natural language semantic representations, computers can do more
applications more efficiently as a result of better understanding of natural text.
However, no single semantic representation at this
time fulfills all requirements needed for a satisfactory representation.
Logic-based representations like first-order logic capture many of the linguistic phenomena
using logical constructs, and they come with standardized inference mechanisms,
but standard first-order logic fails to capture the "graded" aspect of meaning in languages.
Other approaches for semantics, like distributional models,
focus on capturing "graded" semantic similarity of words and phrases
but do not capture sentence structure in the same
detail as logic-based approaches.
However, both aspects of semantics, structure and gradedness, are important
for an accurate language semantics representation.
In this work, we propose a natural language semantics representation
that uses probabilistic logic (PL) to integrate logical with weighted uncertain knowledge.
It combines the expressivity and the automated inference of logic with
the ability to reason with uncertainty.
To demonstrate the effectiveness of our semantic representation, we implement and evaluate it
on three tasks, recognizing textual entailment (RTE), semantic textual similarity (STS) and open-domain question answering (QA).
These tasks can utilize the strengths of our representation and the integration of logical representation
and uncertain knowledge.
Our semantic representation
has three components, Logical Form, Knowledge Base and Inference,
all of which present interesting challenges and we make new contributions in each of them.
The first component is the Logical Form, which is the primary meaning representation.
We address two points, how to translate input sentences to logical form, and how to
adapt the resulting logical form to PL.
First, we use Boxer, a CCG-based semantic analysis tool to translate sentences to logical form. We also
explore translating dependency trees to logical form.
Then, we adapt the logical forms to ensure that
universal quantifiers and negations work as expected.
The second component is the Knowledge Base which contains "uncertain"
background knowledge required for a given problem.
We collect the "relevant" lexical information
from different linguistic resources, encode them as weighted logical rules, and add them to the knowledge base.
We add rules from existing databases, in particular WordNet and the Paraphrase Database (PPDB).
Since these are incomplete, we generate additional on-the-fly rules
that could be useful. We use alignment techniques to propose rules
that are relevant to a particular problem, and explore two alignment methods, one based on Robinson's
resolution and the other based on graph matching.
We automatically annotate the proposed rules and use them to learn weights for unseen rules.
The third component is Inference. This component is implemented for each task separately.
We use the logical form and the knowledge base constructed in the previous two steps to
formulate the task as a PL inference problem then develop a PL inference algorithm
that is optimized for this particular task.
We explore the use of two PL frameworks, Markov Logic Networks
(MLNs) and Probabilistic Soft Logic (PSL).
We discuss which framework works best for a particular task,
and present new inference algorithms for each framework.
PhD Thesis, Department of Computer Science, The University of Texas at Austin.