This website is the archive for past Forum for Artificial Intelligence talks. Please click this link to navigate to the list of current talks. FAI meets every other week (or so) to discuss scientific, philosophical, and cultural issues in artificial intelligence. Both technical research topics and broader inter-disciplinary aspects of AI are covered, and all are welcome to attend! If you would like to be added to the FAI mailing list, subscribe here. If you have any questions or comments, please send email to Catherine Andersson. |
Friday, August 30, 2019, 1:00PM
|
Scalable and Autonomous Reinforcement LearningSimone Parisi [homepage]
Over the course of the last decade, reinforcement learning has developed into a promising tool for learning a large variety of task. A lot of effort has been directed towards scaling reinforcement learning to solve high-dimensional problems, such as robotic tasks with many degrees of freedom or videogames. These advances, however, generally depend on hand-crafted state descriptions, pre-structured parameterized policies, or require large amount of data or human interaction. This pre-structuring is arguably in stark contrast to the goal of autonomous learning.
In this talk, I discuss the need of systematic methods to increase the autonomy of traditional learning systems, and focus on the problems of stability when little data is available, the presence of multiple conflicting objectives and high-dimensional input, and the need of novel exploration strategies in reinforcement learning.
About the speaker:Simone Parisi joined the Intelligent Autonomous System lab on October, 1st, 2014 as a PhD student. His research interests include, amongst others, reinforcement learning, robotics, multi-objective optimization, and intrinsic motivation. During his PhD, Simone is working on Scalable Autonomous Reinforcement Learning (ScARL), developing and evaluating new methods in the field of robotics to guarantee both high degree of autonomy and the ability to solve complex task. Before his PhD, Simone completed his MSc in Computer Science Engineering at the Politecnico di Milano, Italy, and at the University of Queensland, Australia. His thesis, entitled “Study and analysis of policy gradient approaches for multi-objective decision problems, was written under the supervision of Marcello Restelli and Matteo Pirotta. |
Friday, October 4, 2019, 11:00AM
|
Embodied Visual Recognition with Implicit 3D Feature RepresentationsKaterina Fragkiadaki [homepage]
Abstract: Current state-of-the-art CNNs localize rare object categories in internet photos, yet, they miss basic facts that a two-year-old has mastered: that objects have 3D extent, they persist over time despite changes in the camera view, they do not 3D intersect, and others. We will discuss neural architectures that given video streams learn to disentangle scene appearance from camera and object motion, and distill the former into world-centric 3D feature maps. We will show the proposed architectures learn object permanence, can generate RGB views from novel viewpoints in truly novel scenes, have objects emerge in 3D without human annotations, support grounding of language in 3D visual simulations, and learn intuitive physics in a persistent 3D feature space. In this way, they overcome many limitations of 2D CNNs for video perception, model learning and language grounding.
About the speaker:Katerina Fragkiadaki is an Assistant Professor in the Machine Learning Department in Carnegie Mellon University. He received her Ph.D. from University of Pennsylvania in 2013 and was a postdoctoral fellow in UC Berkeley and Google research (2013-2016). She has done a lot of work on video segmentation, motion dynamics learning and on the area of injecting geometry into deep visual learning. Her group develops algorithms for mobile computer vision and learning of Physics and common sense for agents that move around and interact with the world. She received a best Ph.D. thesis award in 2013 and served as the area chair in CVPR 2018, ICML 2019, ICLR 2019, CVPR 2020. |
Friday, October 25, 2019, 11:00AM
|
Leveraging Explanations for Performance and Generalization in NLP and RLNazneen Rajani [homepage]
Abstract: Deep learning models perform poorly on tasks that require commonsense reasoning, which often necessitates some form of world knowledge or reasoning over information not immediately present in the input. In the first part of the talk, I will discuss how language models can be leveraged to generate natural language explanations which are not just interpretable but can also be used to improve performance on a downstream task such as CommonsenseQA and empirically show that explanations are a way to incorporate commonsense reasoning in neural networks. Further, I will discuss how explanations can be transferred to other tasks without fine-tuning.
In the second part of the talk, I will talk about Sherlock, a framework for probing generalization in RL. Although deep reinforcement learning (RL) has seen great success in training agents for complex simulated environments, RL agents often neither generalize nor are interpretable. Sherlock then quantifies the impact of human-interpretable features by comparing generalization performance with the distance between MDPs. Our approach is based on the intuition that, unlike RL agents, humans can adapt quickly to changes in their environment because they base their policy on robust features that are human-interpretable. As such, RL agents may generalize well if they make decisions based on such human- interpretable features.
About the speaker:Nazneen Rajani is a research scientist at Salesforce where she leads the efforts on Explainable AI (XAI), specifically focusing on leveraging explanations not just for interpretability but also generalization. Before joining Salesforce, she graduated with a Ph.D. at UT working with Ray Mooney at the intersection of language and vision. She has published and served as a reviewer for top conferences including ACL, EMNLP, NAACL, and IJCAI. More details about her publications can be found here http://www.nazneenrajani.com |
Friday, November 1, 2019, 11:00AM
|
Semantic link predication for drug discoveryYing Ding [homepage]
Abstract: A critical barrier in current drug discovery is the inability to utilize public datasets in an integrated fashion to fully understand the actions of drugs and chemical compounds on biological systems. There is a need to intelligently integrate heterogeneous datasets pertaining to compounds, drugs, targets, genes, diseases, and drug side effects now available to enable effective network data mining algorithms to extract important biological relationships. In this talk, we demonstrate the semantic integration of 25 different databases and develop various mining and predication methods to identify hidden associations that could provide valuable directions for further exploration at the experimental level.
About the speaker:Bio: Dr. Ying Ding is Bill & Lewis Suit Professor at School of Information, University of Texas at Austin. Before that, she was a professor and director of graduate studies for data science program at School of Informatics, Computing, and Engineering at Indiana University. She has led the effort to develop the online data science graduate program for Indiana University. She also worked as a senior researcher at Department of Computer Science, University of Innsburck (Austria) and Free University of Amsterdam (the Netherlands). She has been involved in various NIH, NSF and European-Union funded projects. She has published 240+ papers in journals, conferences, and workshops, and served as the program committee member for 200+ international conferences. She is the co-editor of book series called Semantic Web Synthesis by Morgan & Claypool publisher, the co-editor-in-chief for Data Intelligence published by MIT Press and Chinese Academy of Sciences, and serves as the editorial board member for several top journals in Information Science and Semantic Web. She is the co-founder of Data2Discovery company advancing cutting edge AI technologies in drug discovery and healthcare. Her current research interests include data-driven science of science, AI in healthcare, Semantic Web, knowledge graph, data science, scholarly communication, and the application of Web technologies. |
Friday, November 15, 2019, 11:00AM
|
Robot Control and Collaboration in Situated Instruction FollowingYoav Artzi [homepage]
ABSTRACT: I will present two projects studying the problem of learning to follow natural language instructions. I will present new datasets, a class of interpretable models for instruction following, learning methods that combine the benefits of supervised and reinforcement learning, and new evaluation protocols. In the first part, I will discuss the task of executing natural language instructions with a robotic agent. In contrast to existing work, we do not engineer formal representations of language meaning or the robot environment. Instead, we learn to directly map raw observations and language to low-level continuous control of a quadcopter drone. In the second part, I will propose the task of learning to follow sequences of instructions in a collaborative scenario, where both the user and the system execute actions in the environment and the user controls the system using natural language. To study this problem, we build CerealBar, a multi-player 3D game where a leader instructs a follower, and both act in the environment together to accomplish complex goals. The two projects were led by Valts Blukis, Alane Suhr, and collaborators. Additional information about both projects is available here:
https://github.com/lil-lab/drif; http://lil.nlp.cornell.edu/cerealbar/
About the speaker:Yoav Artzi is an Assistant Professor in the Department of Computer Science and Cornell Tech at Cornell University. His research focuses on learning expressive models for natural language understanding, most recently in situated interactive scenarios. He received an NSF CAREER award, paper awards in EMNLP 2015, ACL 2017, and NAACL 2018, a Google Focused Research Award, and faculty awards from Google, Facebook, and Workday. Yoav holds a B.Sc. summa cum laude from Tel Aviv University and a Ph.D. from the University of Washington. |
Friday, November 22, 2019, 11:00AM
|
Language as a scaffold for learningJacob Andreas [homepage]
Abstract: Research on constructing and evaluating machine learning models is driven
almost exclusively by examples. We specify the behavior of sentiment classifiers
with labeled documents, guide learning of robot policies by assigning scores to
rollouts, and interpret learned image representations by retrieving salient
training images. Humans are able to learn from richer sources of supervision,
and in the real world this supervision often takes the form of natural language:
we learn word meanings from dictionaries and policies from cookbooks; we show
understanding by explaining rather than demonstrating.
This talk will explore three ways of leveraging language data to train and
interpret machine learning models: using linguistic supervision instead of
rewards to guide policy search, latent language to structure few-shot learning,
and representation translation to generate textual explanations of learned
models.
About the speaker:Jacob Andreas is an assistant professor at MIT and a researcher at Microsoft Semantic Machines. His group's research is aimed at building natural langauge interfaces to intelligent systems and understanding the prediction problems that shape language and other representations. Jacob earned his Ph.D. from UC Berkeley, his M.Phil. from Cambridge (where he studied as a Churchill scholar) and his B.S. from Columbia. He has been the recipient of an NSF graduate fellowship, a Facebook fellowship, and paper awards at NAACL and ICML. |
Friday, December 6, 2019, 11:00AM
|
It’s Time for ReasoningDan Roth [homepage]
Abstract: The fundamental issue underlying natural language understanding is that of semantics – there is a need to move toward understanding natural language at an appropriate level of abstraction in order to support natural language understanding and communication.
Machine Learning has become ubiquitous in our attempt to induce semantic representations of natural language and support decisions that depend on it; however, while we have made significant progress over the last few years, it has focused on classification tasks for which we have large amounts of annotated data. Supporting high level decisions that depend on natural language understanding is still beyond our capabilities, partly since most of these tasks are very sparse and generating supervision signals for it does not scale.
I will discuss some of the challenges underlying reasoning – making natural language understanding decisions that depend on multiple, interdependent, models, and exemplify it using the domain of Reasoning about Time, as it is expressed in natural language. If time suffices, I will touch upon other inference problems that challenge our ability to understand natural language, addressing issues in Information Pollution.
About the speaker:Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, and a Fellow of the AAAS, the ACM, AAAI, and the ACL. In 2017 Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. Roth was recognized “for major conceptual and theoretical advances in the modeling of natural language understanding, machine learning, and reasoning.” Roth has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely. Roth was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR), and the program co-chair of ACL’03, AAAI’08, and CoNLL’02. Roth is a co-founder and CTO of NexLP, Inc., a startup that leverages the latest advances in Natural Language Processing, Cognitive Analytics, and Machine Learning in the legal and compliance domains. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D. in Computer Science from Harvard University in 1995.Watch Online |
Friday, January 17, 2020, 11:00AM
|
Overcoming Mode Collapse and the Curse of DimensionalityKe Li [homepage]
Abstract:
In this talk, I will present our work on overcoming two long-standing problems in machine learning and computer vision:
1. Mode collapse in generative adversarial nets (GANs)
Generative adversarial nets (GANs) are perhaps the most popular class of generative models in use today. Unfortunately, they suffer from the well-documented problem of mode collapse, which the many successive variants of GANs have failed to overcome. I will illustrate why mode collapse happens fundamentally and show a simple way to overcome it, which is the basis of a new method known as Implicit Maximum Likelihood Estimation (IMLE). Whereas conditional GANs can only generate identical images from the same input, conditional IMLE can generate arbitrarily many diverse images from the same input, as shown below.
2. Curse of dimensionality in exact nearest neighbour search
Efficient algorithms for exact nearest neighbour search developed over the past 40 years do not work in high (intrinsic) dimensions, due to the curse of dimensionality. It turns out that this problem is not insurmountable - I will explain how the curse of dimensionality arises and show a simple way to overcome it, which gives rise to a new family of algorithms known as Dynamic Continuous Indexing (DCI).
About the speaker:Ke Li is a recent Ph.D. graduate from UC Berkeley, where he was advised by Prof. Jitendra Malik, and is currently a Research Scientist at Google and a Member of the Institute for Advanced Study (IAS). He is interested in a broad range of topics in machine learning and computer vision and has worked on nearest neighbour search, generative modelling and Learning to Optimize. He is particularly passionate about tackling long-standing fundamental problems that cannot be tackled with a straightforward application of conventional techniques. He received his Hon. B.Sc. in Computer Science from the University of Toronto in 2014.Watch Online |
Friday, January 24, 2020, 11:00AM
|
Advancing Textual Question AnsweringDanqi Chen [homepage]
In this talk, I will discuss my recent work on advancing textual question answering: enabling machines to answer questions based on a passage of text, and more realistically, on a very large collection of documents (aka. “machine reading at scale”). In the first part, I will examine the importance of pre-trained language representations (e.g., BERT, RoBERTa) on the state-of- the-art QA systems. In particular, I will introduce a span-based pre-training method which is designed to better represent and predict spans of text and demonstrates superior performance on a wide range of QA tasks. Although these models already matched or surpassed human performance on some standard benchmarks, there still remains a huge gap when they are scaled up to the open-domain setting. In the second part, I will present two new directions: one is to replace traditional keyword-based retrieval component with fully dense embeddings for passage retrieval and the other is to answer questions based on a structured graph of text passages. Both approaches demonstrate promises for our future textual QA systems.
About the speaker:Bio: Danqi Chen is an Assistant Professor of Computer Science at Princeton University and co-leads the Princeton NLP Group. Danqi’s research focuses on deep learning for natural language processing, with an emphasis on the intersection between text understanding and knowledge representation/reasoning and applications such as question answering and information extraction. Before joining Princeton, Danqi worked as a visiting scientist at Facebook AI Research in Seattle. She received her Ph.D. from Stanford University (2018) and B.E. from Tsinghua University (2012), both in Computer Science. In the past, she was a recipient of Outstanding Paper Awards at ACL’16 and EMNLP’17, a Facebook Fellowship, and a Microsoft Research Women’s Fellowship.Watch Online |
Wednesday, January 29, 2020, 10:30AM
|
Recent Research in Machine Perception at GoogleRahul Sukthankar [homepage]
In this talk I will present some recent efforts in machine perception at Google Research, both on fundamental research problems and on the tech powering several popular Google products. On the applied side, this will include topics such as weakly- and self-supervised learning for content-based search in YouTube and Photos, multimodal interfaces for the Google Assistant, and advances in computational photography that enable novel features for Pixel smartphones. I will also describe some work on longer-term research problems, such as connectome reconstruction for computational neuroscience and neural network based automated theorem proving. Finally, I will also offer a brief perspective on current challenges in video understanding.
About the speaker:Bio: Rahul Sukthankar is a Distinguished Scientist and Director of Machine Perception at Google Research. He is also an adjunct research professor at the Robotics Institute at Carnegie Mellon and courtesy faculty at the University of Central Florida. He received his Ph.D. in Robotics from Carnegie Mellon in 1997 and his B.S.E. in Computer Science from Princeton in 1991. Dr. Sukthankar has organized several computer vision conferences (e.g.. General Chair, CVPR'21), serves as Editor in Chief of Machine Vision and Applications, and is a Fellow of the IEEE. |
Friday, January 31, 2020, 11:00AM
|
Machine Learning Algorithms for Network and Functional GenomicsJian Peng [homepage]
Abstract:
Recent advances in network and functional genomics have enabled large-scale measurements of molecular interactions, functional behavior and consequences of genetic perturbations. Identifying connections, patterns and deeper functional annotations among such heterogeneous measurements will enhance our capability to prediction proteins' function, discover their roles in biological processes underlying diseases, and develop novel therapeutics. In this talk, I will describe machine algorithms that interrogate molecular interactions and perturbation screens to understand protein functions. First, I will introduce Mashup, a graph-based learning algorithm that integrates multiple heterogeneous networks into compact topological features for protein functional inference. I will also briefly talk about applications of Mashup to discovering new disease factors and subnetworks from genetic perturbations and variations. Finally, I will present our recent work on using deep learning for modeling protein sequence-to-function mapping from large-scale mutagenesis and its application to protein design and engineering.
About the speaker:Bio: Jian Peng has been an assistant professor of computer science at UIUC since 2015. Before joining Illinois, Jian was a postdoc at CSAIL at MIT and a visiting scientist at the Whitehead Institute for Biomedical Research. He obtained his Ph.D. in Computer Science from Toyota Technological Institute at Chicago in 2013. His research interests include bioinformatics, cheminformatics and machine learning. Algorithms developed by Jian and his co-workers were successful in several scientific challenges, including the Critical Assessment of Protein Structure Prediction (CASP) competitions and a few DREAM challenges on translational medicine and pharmacogenomics. Recently, Jian has received an NSF CAREER Award, a PhRMA Foundation Award, and an Alfred P. Sloan Research Fellowship.Watch Online |
Friday, February 7, 2020, 11:00AM
|
Toward robust manipulation in complex environmentsDieter Fox [homepage]
Over the last years, advances in deep learning and GPU-based computing have enabled significant progress in several areas of robotics, including visual recognition, real-time tracking, object manipulation, and learning-based control. This progress has turned applications such as autonomous driving and delivery tasks in warehouses, hospitals, or hotels into realistic application scenarios. However, robust manipulation in complex settings is still an open research problem. Various research efforts show promising results on individual pieces of the manipulation puzzle, including manipulator control, touch sensing, object pose detection, task and motion planning, and object pickup. In this talk, I will present our recent work in integrating such components into a complete manipulation system. Specifically, I will describe a mobile robot manipulator that moves through a kitchen, can open and close cabinet doors and drawers, detect and pickup objects, and move these objects to desired locations. Our baseline system is designed to be applicable in a wide variety of environments, only relying on 3D articulated models of the kitchen and the relevant objects. I will discuss the design choices behind our approach, the lessons we learned so far, and various research directions toward enabling more robust and general manipulation systems.
About the speaker:Dieter Fox is Senior Director of Robotics Research at NVIDIA. He is also a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, where he heads the UW Robotics and State Estimation Lab. Dieter obtained his Ph.D. from the University of Bonn, Germany. His research is in robotics and artificial intelligence, with a focus on state estimation and perception applied to problems such as mapping, object detection and tracking, manipulation, and activity recognition. He has published more than 200 technical papers and is the co-author of the textbook "Pobabilistic Robotics". He is a Fellow of the IEEE and the AAAI, and he received several best paper awards at major robotics, AI, and computer vision conferences. He was an editor of the IEEE Transactions on Robotics, program co-chair of the 2008 AAAI Conference on Artificial Intelligence, and program chair of the 2013 Robotics: Science and Systems conference.Watch Online |
Friday, February 14, 2020, 11:00AM
|
Robust Logic Reasoning with Graph Neural NetworksLe Song [homepage]
Logic reasoning can make predictions and transfer to new cases with small data, but it is very brittle to noise and inconsistency in data. Graph neural networks, which embed message passing algorithms in function spaces, can learn representation for noisy graph data. In this talk, I will show that graph neural networks can make logic reasoning more robust to noise, in tasks such as fraud detection in networks, knowledge graph analysis, and visual scene graph understanding.
About the speaker:Le Song is an Associate Professor in the Department of Computational Science and Engineering, College of Computing, an Associate Director of the Center for Machine Learning, Georgia Institute of Technology, and also a Principal Engineer of Ant Financial, Alibaba. His principal research direction is machine learning, especially kernel methods and deep learning, and probabilistic graphical models for large and complex problems, arising from artificial intelligence and interdisciplinary domains. He is the recipient of the NIPS'17 Materials Science Workshop Best Paper Award, the Recsys'16 Deep Learning Workshop Best Paper Award, AISTATS'16 Best Student Paper Award, IPDPS'15 Best Paper Award, NSF CAREER Award'14, NIPS'13 Outstanding Paper Award, and ICML'10 Best Paper Award. He has also served as the area chair or senior program committee for many leading machine learning and AI conferences such as ICML, NeurIPS, AISTATS, AAAI and IJCAI, and the action editor for JMLR and IEEE TPAMI.Watch Online |
Wednesday, February 19, 2020, 2:00PM
|
Learning Visual Organization with Minimal Human SupervisionStella Yu [homepage]
Abstract: Computer vision has advanced rapidly with deep learning, achieving super-human performance on a few recognition benchmarks. At the core of the state-of-the-art approaches for image classification, object detection, and semantic/instance segmentation is sliding-window classification, engineered for computational efficiency. Such piecemeal analysis of visual perception often has trouble getting details right and fails miserably with occlusion. Human vision, on the other hand, thrives on occlusion, excels at seeing wholes and parts, and can recognize objects with very little supervision. I will describe several works that build upon concepts of perceptual organization, integrate multiscale and figure-ground cues, learn to develop pixel and image relationships in a data-driven fashion, with no annotations at all or with lesser and fewer annotations, in order to deliver more accurate and generalizing performance beyond recognition in a closed world. Our recent works can not only capture apparent visual similarity without perceptual organization priors or any feature engineering, but also provide powerful exploratory data analysis tools that can seamlessly integrate external domain knowledge into a data-driven machine learning framework.
About the speaker:Bio: Stella Yu received her Ph.D. from Carnegie Mellon University, where she studied robotics at the Robotics Institute and vision science at the Center for the Neural Basis of Cognition. Dr. Yu is currently the Director of Vision Group at the International Computer Science Institute (ICSI) and a Senior Fellow at the Berkeley Institute for Data Science (BIDS) at UC Berkeley. Dr. Yu is interested not only in understanding visual perception from multiple perspectives, including art and vision for which she received an NSF CAREER award, but also in using computer vision and machine learning to capture and exceed human expertise in practical applications.Watch Online |
Friday, February 28, 2020, 11:00AM
|
Smart Algorithms or Hardware Acceleration for Extreme-Scale Deep LearningAnshumali Shrivastava [homepage]
Current Deep Learning (DL) architectures are growing larger
to learn from complex datasets. The trends show that the only
sure-shot way of surpassing prior accuracy is to increase the model
size, supplement it with more data, followed by aggressive
fine-tuning. However, training and tuning astronomical sized models
are time-consuming and stall the progress. As a result, industries are
increasingly investing in specialized hardware and deep learning
accelerators like GPUs to expedite the process of training. On the
orthogonal side, progress on developing efficient algorithms for
training the neural network, so far, has failed to surpass the
advantages of parallelism over the navie backpropagation algorithm. It
is taken for granted that traditional CPUs are incapable of
outperforming powerful accelerators such as V100 GPUs in a
head-to-head comparison on training large DL models.
In this talk, I will demonstrate the first progress on the algorithmic
front where we will show how a smart algorithms on traditional CPU is
4x faster (1.3hours Vs. 5.5 hours) than the most optimized
implementations of Tensorflow on the best available V100 GPUs in a
head to head comparisons. The algorithm leverages some of the recent
and surprising findings that LSH can be used as a constant time
(amortized) sampler and estimator. Locality Sensitive Hashing (LSH) is
a hugely popular algorithm for near neighbor search, but the algorithm
is too slow for any real speedup. Instead, the sampling view of LSH
allows us to design several orders of magnitude efficient and
embarrassingly parallel adaptive dropout scheme for training large
neural networks. Our observation bridges data structures
(probabilistic hash tables) with efficient unbiased statistical
estimations.
In the second part, I will provide a case study of amazon search data
with 50 million products. At this scale, a natural DL model requires
more than 100 billion parameters, which will need 400GB to store the
model. The best GPU only has 32GB memory. It turns out that training a
100 billion parameter model is near infeasible for any company.
Hardware acceleration, by itself, is currently not capable of taming
this scale. I will then present Merged-Average Classifiers via Hashing
(MACH), a generic K-classification algorithm where memory provably
scales at O(logK) without any strong assumption on the classes. MACH
is subtly a count-min sketch structure in disguise, which uses
universal hashing to reduce classification with a large number of
classes to few embarrassingly parallel and independent classification
tasks with a small (constant) number of classes. MACH naturally
provides a technique for zero communication model parallelism. MACH
outperforms, by a significant margin, the state-of-the-art extreme
classification models deployed on commercial search engines: Parabel
and dense embedding models. Our largest model has 6.4 billion
parameters and trains in less than 35 hours on a single p3.16x
machine. The training times are 7-10x faster, and the memory
footprints are 2-4x smaller than the best baselines. This training
time is also significantly lower than the one reported by Google's
mixture of experts (MoE) language model on comparable model size and
hardware.
Based on two papers (MLSys 2020, NeurIPS 2019), one in collaboration
with Intel and another one with Amazon.
About the speaker:Anshumali Shrivastava is an assistant professor in the computer science department at Rice University. His broad research interests include randomized algorithms for large-scale machine learning. In 2018, Science news named him one of the Top-10 scientists under 40 to watch. He is a recipient of the National Science Foundation CAREER Award, a Young Investigator Award from the Air Force Office of Scientific Research, and a machine learning research award from Amazon. He has won numerous paper awards, including Best Paper Award at NIPS 2014 and Most Reproducible Paper Award at SIGMOD 2019. His work has been featured in several media outlets including the New York Times, IEEE Spectrum, Science News, and ArsTechnica.Watch Online |
Friday, March 6, 2020, 11:00AM
|
Vision-and-Language NavigationPeter Anderson [homepage]
Abstract:
The growing availability of high-quality 3D reconstructions of indoor and outdoor scenes creates exciting opportunities for grounded language learning. This talk will focus on the challenging problem of vision-and-language navigation (VLN), in which an agent is placed in a photo-realistic simulation environment and given a natural language navigation instruction to follow. Several recent datasets for this task will be introduced, including R2R, (RE)TOUCHDOWN and REVERIE. On the modeling side, I will discuss a formulation of instruction-following in terms of Bayesian state tracking, which computes explicit probabilities for different trajectories in a map constructed on-the-fly. Finally, I will discuss our ongoing efforts to integrate trained VLN agents with standard ROS components on a physical robot to assess the sim-to-real performance gap.
About the speaker:Bio: Peter Anderson is a Research Scientist in the Language team at Google Research. Prior to joining Google he was a Research Scientist in the School of Interactive Computing at Georgia Tech. His research interests include computer vision, natural language processing and problems at the intersection of these fields in particular. His recent work has focused on grounded language learning, particularly in large-scale visually-realistic 3D environments, as well as image captioning and visual question answering (VQA). He completed his PhD in Computer Science at the Australian National University in 2018.Watch Online |