The field of transfer learning, which has witnessed a great increase in popularity in recent years, addresses the problem of how to leverage previously acquired knowledge in order to improve the efficiency and accuracy of learning in a new domain that is in some way related to the original one. In particular, our current research is focused on developing transfer learning techniques for Markov Logic Networks (MLNs), a recently developed approach to statistical relational learning.
Our research in the area is currently sponsored by the Defense Advanced Research Projects Agency (DARPA) and managed by the Air Force Research Laboratory (AFRL) under contract FA8750-05-2-0283.
For the majority of the machine learning community, the expensive nature of collecting high-quality human-annotated data and the inability to efficiently finetune very large state-of-the-art pretrained models on limited compute are major bottlenecks for building models for new tasks. We propose a zero-shot simple approach for one such task, Video Moment Retrieval (VMR), that does not perform any additional finetuning and simply repurposes off-the-shelf models trained on other tasks. Our three-step approach consists of moment proposal, moment-query matching and postprocessing, all using only off-the-shelf models. On the QVHighlights (Lei et al., 2021) benchmark for VMR, we vastly improve performance of previous zero-shot approaches by at least 2.5x on all metrics and reduce the gap between zero-shot and state-of-the-art supervised by over 74%. Further, we also show that our zero-shot approach beats non-pretrained supervised models on the Recall metrics and comes very close on mAP metrics; and that it also performs better than the best pretrained supervised model on shorter moments. Finally, we ablate and analyze our results and propose interesting future directions.
ML ID: 412
In several applications, scarcity of labeled data is a challenging problem that hinders the predictive capabilities of machine learning algorithms. Additionally, the distribution of the data changes over time, rendering models trained with older data less capable of discovering useful structure from the newly available data. Transfer learning is a convenient framework to overcome such problems where the learning of a model specific to a domain can benefit the learning of other models in other domains through either simultaneous training of domains or sequential transfer of knowledge from one domain to the others. This thesis explores the opportunities of knowledge transfer in the context of a few applications pertaining to object recognition from images, text analysis, network modeling and recommender systems, using probabilistic latent variable models as building blocks. Both simultaneous and sequential knowledge transfer are achieved through the latent variables, either by sharing these across multiple related domains (for simultaneous learning) or by adapting their distributions to fit data from a new domain (for sequential learning).
ML ID: 322
Multitask learning (MTL) via a shared representation has been adopted to alleviate problems with sparsity of labeled data across different learning tasks. Active learning, on the other hand, reduces the cost of labeling examples by making informative queries over an unlabeled pool of data. Therefore, a unification of both of these approaches can potentially be useful in settings where labeled information is expensive to obtain but the learning tasks or domains have some common characteristics. This paper introduces two such models -- Active Doubly Supervised Latent Dirichlet Allocation (Act-DSLDA) and its non-parametric variation (Act-NPDSLDA) that integrate MTL and active learning in the same framework. These models make use of both latent and supervised shared topics to accomplish multitask learning. Experimental results on both document and image classification show that integrating MTL and active learning along with shared latent and supervised topics is superior to other methods which do not employ all of these components.
ML ID: 297
This paper introduces two new frameworks, Doubly Supervised Latent Dirichlet Allocation (DSLDA) and its non-parametric variation (NP-DSLDA), that integrate two different types of supervision: topic labels and category labels. This approach is particularly useful for multitask learning, in which both latent and supervised topics are shared between multiple categories. Experimental results on both document and image classification show that both types of supervision improve the performance of both DSLDA and NP-DSLDA and that sharing both latent and supervised topics allows for better multitask learning.
ML ID: 289
Traditionally, machine learning algorithms assume that training data is provided as a set of independent instances, each of which can be described as a feature vector. In contrast, many domains of interest are inherently multi-relational, consisting of entities connected by a rich set of relations. For example, the participants in a social network are linked by friendships, collaborations, and shared interests. Likewise, the users of a search engine are related by searches for similar items and clicks to shared sites. The ability to model and reason about such relations is essential not only because better predictive accuracy is achieved by exploiting this additional information, but also because frequently the goal is to predict whether a set of entities are related in a particular way. This thesis falls within the area of Statistical Relational Learning (SRL), which combines ideas from two traditions within artificial intelligence, first-order logic and probabilistic graphical models, to address the challenge of learning from multi-relational data. We build on one particular SRL model, Markov logic networks (MLNs), which consist of a set of weighted first-order-logic formulae and provide a principled way of defining a probability distribution over possible worlds. We develop algorithms for learning of MLN structure both from scratch and by transferring a previously learned model, as well as an application of MLNs to the problem of Web query disambiguation. The ideas we present are unified by two main themes: the need to deal with limited training data and the use of bottom-up learning techniques.Structure learning, the task of automatically acquiring a set of dependencies among the relations in the domain, is a central problem in SRL. We introduce BUSL, an algorithm for learning MLN structure from scratch that proceeds in a more bottom-up fashion, breaking away from the tradition of top-down learning typical in SRL. Our approach first constructs a novel data structure called a Markov network template that is used to restrict the search space for clauses. Our experiments in three relational domains demonstrate that BUSL dramatically reduces the search space for clauses and attains a significantly higher accuracy than a structure learner that follows a top-down approach.
Accurate and efficient structure learning can also be achieved by transferring a model obtained in a source domain related to the current target domain of interest. We view transfer as a revision task and present an algorithm that diagnoses a source MLN to determine which of its parts transfer directly to the target domain and which need to be updated. This analysis focuses the search for revisions on the incorrect portions of the source structure, thus speeding up learning. Transfer learning is particularly important when target-domain data is limited, such as when data on only a few individuals is available from domains with hundreds of entities connected by a variety of relations. We also address this challenging case and develop a general transfer learning approach that makes effective use of such limited target data in several social network domains.
Finally, we develop an application of MLNs to the problem of Web query disambiguation in a more privacy-aware setting where the only information available about a user is that captured in a short search session of 5--6 previous queries on average. This setting contrasts with previous work that typically assumes the availability of long user-specific search histories. To compensate for the scarcity of user-specific information, our approach exploits the relations between users, search terms, and URLs. We demonstrate the effectiveness of our approach in the presence of noise and show that it outperforms several natural baselines on a large data set collected from the MSN search engine.
ML ID: 235
A central goal of transfer learning is to enable learning when training data from the domain of interest is limited. Yet, work on transfer across relational domains has so far focused on the case where there is a significant amount of target data. This paper bridges this gap by studying transfer when the amount of target data is minimal and consists of information about just a handful of entities. In the extreme case, only a single entity is known. We present the SR2LR algorithm that finds an effective mapping of predicates from a source model to the target domain in this setting and thus renders pre-existing knowledge useful to the target task. We demonstrate SR2LR's effectiveness in three benchmark relational domains on social interactions and study its behavior as information about an increasing number of entities becomes available.
ML ID: 227
This paper introduces the single-entity-centered setting for transfer across two relational domains. In this setting, target domain data contains information about only a single entity. We present the SR2LR algorithm that finds an effective mapping of the source model to the target domain in this setting and demonstsrate its effectiveness in three relational domains. Our experiments additionally show that the most accurate model for the source domain is not always the best model to use for transfer.
ML ID: 218
Statistical relational learning (SRL) algorithms combine ideas from rich knowledge representations, such as first-order logic, with those from probabilistic graphical models, such as Markov networks, to address the problem of learning from multi-relational data. One challenge posed by such data is that individual instances are frequently very large and include complex relationships among the entities. Moreover, because separate instances do not follow the same structure and contain varying numbers of entities, they cannot be effectively represented as a feature-vector. SRL models and algorithms have been successfully applied to a wide variety of domains such as social network analysis, biological data analysis, and planning, among others. Markov logic networks (MLNs) are a recently-developed SRL model that consists of weighted first-order clauses. MLNs can be viewed as templates that define Markov networks when provided with the set of constants present in a domain. MLNs are therefore very powerful because they inherit the expressivity of first-order logic. At the same time, MLNs can flexibly deal with noisy or uncertain data to produce probabilistic predictions for a set of propositions. MLNs have also been shown to subsume several other popular SRL models.The expressive power of MLNs comes at a cost: structure learning, or learning the first-order clauses of the model, is a very computationally intensive process that needs to sift through a large hypothesis space with many local maxima and plateaus. It is therefore an important research problem to develop learning algorithms that improve the speed and accuracy of this process. The main contribution of this proposal are two algorithms for learning the structure of MLNs that proceed in a more data-driven fashion, in contrast to most existing SRL algorithms. The first algorithm we present, R-TAMAR, improves learning by transferring the structure of an MLN learned in a domain related to the current one. It first diagnoses the transferred structure and then focuses its efforts only on the regions it determines to be incorrect. Our second algorithm, BUSL improves structure learning from scratch by approaching the problem in a more bottom-up fashion and first constructing a variablized Markov network template that significantly constrains the space of viable clause candidates. We demonstrate the effectiveness of our methods in three social domains.
Our proposed future work directions include testing BUSL in additional domains and extending it so that it can be used not only to learn from scratch, but also to revise a provided MLN structure. Our most ambitious long-term goal is to develop a system that transfers knowledge from multiple potential sources. An important prerequisite to such a system is a method for measuring the similarity between domains. We would also like to extend BUSL to learn other SRL models and to handle functions.
ML ID: 217
Transfer learning addresses the problem of how to leverage knowledge acquired in a source domain to improve the accuracy and speed of learning in a related target domain. This paper considers transfer learning with Markov logic networks (MLNs), a powerful formalism for learning in relational domains. We present a complete MLN transfer system that first autonomously maps the predicates in the source MLN to the target domain and then revises the mapped structure to further improve its accuracy. Our results in several real-world domains demonstrate that our approach successfully reduces the amount of time and training data needed to learn an accurate model of a target domain over learning from scratch.
ML ID: 203
We propose a new algorithm for transfer learning of Markov Logic Network (MLN) structure. An important aspect of our approach is that it first diagnoses the provided source MLN and then focuses on re-learning only the incorrect portions. Experiments in a pair of synthetic domains demonstrate that this strategy significantly decreases the search space and speeds up learning while maintaining a level of accuracy comparable to that of the current best algorithm.
ML ID: 189