This website is the archive for past Forum for Artificial Intelligence talks. Please click this link to navigate to the list of current talks. FAI meets every other week (or so) to discuss scientific, philosophical, and cultural issues in artificial intelligence. Both technical research topics and broader inter-disciplinary aspects of AI are covered, and all are welcome to attend! If you would like to be added to the FAI mailing list, subscribe here. If you have any questions or comments, please send email to Catherine Andersson. |
Friday, October 2, 2020, 11:00AM
|
ALFRED -- A Simulated Playground for Connecting Language, Action, and Perception.Yonathan Bisk [homepage]
Vision-and-Language Navigation has become a popular task in the grounding literature, but the real world includes interaction, state-changes, and long horizon planning. (Actually, the real world requires motors and torques, but let's ignore that for the moment.) We present ALFRED (Action Learning From Realistic Environments and Directives) as a benchmark dataset with the goal of facilitating more complex embodied language understanding. In this talk, I'll discuss the benchmark itself and subsequent pieces of work enabled by the environment and annotations. Our goal is to provide a playground for moving embodied language+vision research closer to robotics enabling the community to work on uncovering abstractions and interactions between planning, reasoning, and action taking.
About the speaker:Bio: Yonatan Bisk is an Assistant Professor in the Language Technologies Institute at Carnegie Mellon University. He received his PhD from The University of Illinois at Urbana-Champaign where he worked on CCG with Julia Hockenmaier. Prior to that he received his BS from Turing Scholars at The University of Texas at Austin. This means he has two of the same affiliations as Ray Mooney. His primary research question is -- What knowledge can't be learned from text?Watch Online |
Friday, October 9, 2020, 11:00AM
|
Multimodal AI: Self-supervised Learning, Adversarial Training, and Vision+Language InferenceJingjing Liu [homepage]
We live in a multimodal world - watching a cloud unfolding, listening to a bird chirping, smelling a rose, tasting a drop of dew, touching a finch feather. To perceive the physical world, a computer system needs to process these multimodal signals and fuse diverse knowledge across modalities to reach a holistic understanding of its complex surroundings. Take visual and lingual signals as an example. Multimodal embedding has been the bedrock for almost all Vision+Language (V+L) tasks, where multimodality inputs are simultaneously processed for joint visual and textual understanding. In this talk, I will introduce UNITER (UNiversal Image-TExt Representation) and HERO (Hierarchical EncodeR for videO+language representation learning), and explain their key ingredients of pre-training tasks design, pre-training data selection, and large-scale pre-training techniques. Another project I will present is VILLA (VIsion-and-Language Large-scale Adversarial training), the first known effort on adversarial training for V+L, with a task-agnostic adversarial pre-training stage followed by task-specific adversarial finetuning. UNITER, HERO and VILLA have achieved new state of the art across a wide range of V+L tasks, such as Visual Question Answering, Visual Commonsense Reasoning, Image-Text Retrieval, Visual Entailment, Video Moment Retrieval, and Video Captioning. If time allows, I will also introduce a new task, VIOLIN (VIdeO-and-Language INference), which requires a model to learn sophisticated reasoning skills, from shallow grounding (e.g., identifying objects and characters in the video) to in-depth commonsense reasoning (e.g., inferring causal relations of events) to understand the complex temporal dynamics in rich visual content of videos for inference.
About the speaker:Bio: Dr. Jingjing Liu is Senior Principal Research Manager at Microsoft, leading a research group in Multimodal AI (http://aka.ms/mmai). Her current research interests center on Vision+Language (V+L) Multimodal Intelligence, the intersection between Natural Language Processing (NLP) and Computer Vision, such as Visual Question Answering, Text-to-Image Synthesis, Image/Video Captioning, Self-supervised Learning and Adversarial Training. Before joining Microsoft, Dr. Liu was a Research Scientist in Computer Science & Artificial Intelligence Laboratory (CSAIL) at MIT, working on NLP and Spoken Dialogue Systems. She received the PhD degree in Computer Science from MIT Department of EECS, and holds an MBA degree from Judge Business School (JBS) at University of Cambridge, with focus area in Entrepreneurship and Investment. Dr. Liu has been interviewed by The New York Times and Forbes, and her team has achieved No. 1 on many public AI benchmarks, such as GLUE, XTREME, XGLUE, ARC, and VCR.Watch Online |
Friday, October 16, 2020, 11:00AM
|
Learning on Pointclouds for 3D Scene UnderstandingOr Litany [homepage]
In this talk i'll be covering several works in the topic of 3D deep learning on pointclouds for scene understanding tasks.
First, I'll describe VoteNet (ICCV 2019): a method for object detection from 3D pointclouds input, inspired by the classical generalized Hough voting technique. I'll then explain how we integrated image information into the voting scheme to further boost 3D detection (ImVoteNet, CVPR 2020). In the last part of my talk i'll describe a recent study about transfer learning for 3D pointclouds which led to the development of the PointContrast framework (ECCV 2020). Our findings are extremely encouraging: using a unified triplet of architecture, source dataset, and contrastive loss for pre-training, we achieve improvement over recent best results in segmentation and detection across 6 different benchmarks for indoor and outdoor, real and synthetic datasets -- demonstrating that the learned representation can generalize across domains.
About the speaker:Bio: Or Litany (PhD 2018, Tel-Aviv University) is a Research Scientist at Nvidia. Before that he was a postdoctoral fellow at Stanford University, working under Prof. Leonidas Guibas and a postdoc at Facebook AI Research. Or's main interests include 3D deep learning, computational shape analysis and representation learning.Watch Online |
Friday, October 23, 2020, 11:00AM
|
Factored Value Functions for Cooperative Multi-Agent Reinforcement LearningShimon Whiteson [homepage]
Cooperative multi-agent reinforcement learning (MARL) considers how teams of agents can coordinate their behaviour to efficiently achieve common goals. A key challenge therein is how to learn cooperative policies in a centralised fashion that nonetheless can be executed in a decentralised fashion. In this talk, I will discuss QMIX, a simple but powerful cooperative MARL algorithm that relies on factored value functions both to make learning efficient and to ensure decentralisability. Extensive results on the StarCraft Multi-Agent Challenge (SMAC), a benchmark we have developed, confirm that QMIX outperforms alternative approaches, though further analysis shows that this is not always for the reasons we expected.
About the speaker:Bio: Shimon Whiteson is a Professor of Computer Science at the University of Oxford and the Head of Research at Waymo UK. His research focuses on deep reinforcement learning and learning from demonstration, with applications in robotics and video games. He completed his doctorate at the University of Texas at Austin in 2007. He spent eight years as an Assistant and then an Associate Professor at the University of Amsterdam before joining Oxford as an Associate Professor in 2015. He was awarded a Starting Grant from the European Research Council in 2014, a Google Faculty Research Award in 2017, and a JPMorgan Faculty Award in 2019.Watch Online |
Wednesday, October 28, 2020, 11:00AM
|
Improving Compositional Generalization with Latent Tree StructuresJonathan Berant [homepage]
A recent focus in machine learning and natural language processing is on models that generalize beyond their training distribution. One natural form of such generalization, which humans excel in, is compositional generalization: the ability to generalize at test time to new unobserved compositions of atomic components that were observed at training time. Recent work has shown that current models struggle to generalize in such scenarios. In this talk, I will present recent work, which demonstrates how an inductive bias towards tree structures substantially improves compositional generalization in two question answering setups. First, we present a model that given a compositional question and an image, constructs a tree over the input question and answers the question from the root representation. Trees are not given at training time and are fully induced from the answer supervision only. We show that our approach improves compositional generalization on the CLOSURE dataset from 72.2-->96.1 accuracy, while obtaining comparable performance to models such as FILM and MAC on human-authored questions. Second, we present a span-based semantic parser, which induces a tree over the input to compute an output logical form, handling a certain sub-class of non-projective trees. We evaluate this on several compositional splits of existing datasets, improving performance, on Geo880 e.g., from 54.0-->82.2. Overall, we view these results as strong evidence that an inductive bias towards tree structures dramatically improves compositional generalization compared to existing approaches.
About the speaker:Bio: My field of research is Natural Language Processing. I work on Natural Language Understanding problems such as Semantic Parsing, Question Answering, Paraphrasing, Reading Comprehension, and Textual Entailment. I am mostly excited about learning from weak supervision that is easy to obtain and grounded in the world, and in tasks that require multi-step inference or handling of language compositionality. I am an associate professor at the Blavatnik School of Computer Science, and a Research Scientist and The Allen Institute for Artificial Intelligence.Watch Online |
Friday, November 6, 2020, 11:00AM
|
Title: Embeddings of spoken words across tasks and languagesKaren Livescu [homepage]
Word embeddings have become a ubiquitous tool in natural language processing.
These embeddings represent the meanings of written words. On the other hand, for
spoken language it may be more important to represent how a written word *sounds*
rather than (or in addition to) what it means. For some applications it can also
be helpful to represent variable-length acoustic signals corresponding to words,
or other linguistic units, as fixed-dimensional vectors, or acoustic word
embeddings. Closely related are acoustically grounded embeddings of written
words, that is embeddings that represent the way a written word sounds by training
on paired acoustic and textual data. Such embeddings can be useful for speeding
up or improving performance on a number of speech tasks. This talk will present
work on both acoustic word embeddings and "acoustically grounded" written word
embeddings, including their applications for improved speech recognition and
search in English and across languages.
About the speaker:Bio: Karen Livescu is an Associate Professor at TTI-Chicago. She completed her PhD in electrical engineering and computer science at MIT. Her main research interests are in speech and language processing, as well as related problems in machine learning. Her recent work includes unsupervised and multi-view representation learning, acoustic word embeddings, visually grounded speech modeling, and automatic sign language recognition. Her recent professional activities include serving as a program chair of ICLR 2019 and a technical chair of ASRU 2015/2017/2019.Watch Online |
Friday, November 13, 2020, 11:00AM
|
Knowledge-Rich Neural Text Comprehension and ReasoningHanna Hajishirzi [homepage]
Enormous amounts of ever-changing knowledge are available online in diverse textual styles (e.g., news vs. science text) and diverse formats (knowledge bases vs. web pages vs. textual documents). This talk presents the question of textual comprehension and reasoning given this diversity: how can AI help applications comprehend and combine evidence from variable, evolving sources of textual knowledge to make complex inferences and draw logical conclusions? I present question answering and fact checking algorithms that offer rich natural language comprehension using multi-hop and interpretable reasoning. Recent advances in deep learning algorithms, large-scale datasets, and industry-scale computational resources are spurring progress in many Natural Language Processing (NLP) tasks, including question answering. Nevertheless, current models lack the ability to answer complex questions that require them to reason intelligently across diverse sources and explain their decisions. Further, these models cannot scale up when task-annotated training data are scarce and computational resources are limited. With a focus on textual comprehension and reasoning, this talk will present some of the most recent efforts in my lab to integrate capabilities of symbolic AI approaches into current deep learning algorithms. I will present interpretable algorithms that understand and reason about textual knowledge across varied formats and styles, generalize to emerging domains with scarce training data (are robust), and operate efficiently under resource limitations (are scalable).
About the speaker:Bio: Hanna Hajishirzi is an Assistant Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington and a Research Fellow at the Allen Institute for AI. Her research spans different areas in NLP and AI, focusing on developing machine learning algorithms that represent, comprehend, and reason about diverse forms of data at large scale. Applications for these algorithms include question answering, reading comprehension, representation learning, knowledge extraction, and conversational dialogue. Honors include the Sloan Fellowship, Allen Distinguished Investigator Award, Intel rising star award, multiple best paper and honorable mention awards, and several industry research faculty awards. Hanna received her PhD from University of Illinois and spent a year as a postdoc at Disney Research and CMU. |
Friday, December 4, 2020, 11:00AM
|
Leveraging Language in Learning Robot Manipulation SkillsJeanette Bohg [homepage]
Humans have gradually developed language, mastered complex motor skills, created and utilized sophisticated tools. The act of conceptualization is fundamental to these abilities because it allows humans to mentally represent, summarize and abstract diverse knowledge and skills. By means of abstraction, concepts that we learn from a limited number of examples can be extended to a potentially infinite set of new and unanticipated situations. My long-term goal is to endow robots with this generalization ability. In this talk, I will present work that gives robots the ability to acquire a variety of manipulation concepts that act as mental representations of verbs in a natural language instruction. We propose to use learning from human demonstrations of manipulation actions as recorded in large-scale video data sets that are annotated with natural language instructions. Specifically, we propose to use a video classifier that scores how well the robot imitates the human actions. This approach alleviates the need for hand-designing rewards and for time-consuming processes such as teleoperation or kinesthetic teaching. In extensive simulation experiments, we show that the policy learned in the proposed way can perform a large percentage of the 78 different manipulation tasks on which it was trained. The tasks are of greater variety and complexity than previously considered collections of robotic manipulation tasks. We show that the policy generalizes over variations of the environment. We also show examples of successful generalization over novel but similar instructions.
About the speaker:Bio: Jeannette Bohg is an Assistant Professor of Computer Science at Stanford University. She was a group leader at the Autonomous Motion Department (AMD) of the MPI for Intelligent Systems until September 2017. Before joining AMD in January 2012, Jeannette Bohg was a PhD student at the Division of Robotics, Perception and Learning (RPL) at KTH in Stockholm. In her thesis, she proposed novel methods towards multi-modal scene understanding for robotic grasping. She also studied at Chalmers in Gothenburg and at the Technical University in Dresden where she received her Master in Art and Technology and her Diploma in Computer Science, respectively. Her research focuses on perception and learning for autonomous robotic manipulation and grasping. She is specifically interesting in developing methods that are goal-directed, real-time and multi-modal such that they can provide meaningful feedback for execution and learning. Jeannette Bohg has received several awards, most notably the 2019 IEEE International Conference on Robotics and Automation (ICRA) Best Paper Award, the 2019 IEEE Robotics and Automation Society Early Career Award and the 2017 IEEE Robotics and Automation Letters (RA-L) Best Paper Award.Watch Online |
Friday, December 11, 2020, 11:00AM
|
Bringing Visual Memories to LifeJia-Bin Huang [homepage]
Photography allows us to capture and share memorable moments of our lives. However, 2D images appear flat due to the lack of depth perception and may suffer from poor imaging conditions such as taking photos through reflecting or occluding elements. In this talk, I will present our recent efforts to overcome these limitations. Specifically, I will cover our recent work for creating compelling 3D photography, removing unwanted obstructions seamlessly from images or videos, and estimating consistent video depth for advanced video-based visual effects. I will conclude the talk with some ongoing research and research challenges ahead.
About the speaker:Bio: Jia-Bin Huang is an Assistant Professor in the Bradley Electrical and Computer Engineering at Virginia Tech. He received his Ph.D. degree from the Department of Electrical and Computer Engineering at the University of Illinois, Urbana-Champaign. His research interests include computer vision, computer graphics, and machine learning with a focus on visual analysis and synthesis with physically grounded constraints. His research received the best student paper award in IAPR International Conference on Pattern Recognition (ICPR) for the work on computational modeling of visual saliency and the best paper award in the ACM Symposium on Eye Tracking Research & Applications (ETRA) for work on learning-based eye gaze tracking. Huang is the recipient of the NSF CRII award, Samsung Global Outreach Award, 3M non-tenured faculty award, and a Google faculty research award. |
Friday, February 12, 2021, 11:00AM
|
Sharp Minimax Rates for Imitation LearningJiantao Jiao [homepage]
We establish sharp minimax bounds on Imitation Learning (IL) in episodic Markov Decision Processes (MDPs), where the learner is provided a dataset of demonstrations from an expert. It is known that Behavior Cloning (BC) achieves suboptimality growing quadratically in horizon, which is termed as error compounding in the literature. We show that when the MDP transition function is unknown, all algorithms have to suffer a suboptimality that grows quadratically with the horizon, even if the algorithm can interactively query the expert such as in the setting of DAGGER. We then propose the setting of known transitions and show that one can provably break the quadratic dependence and improve the exponent to 3/2, which is shown to be tight. Our upper bound is established using a computationally efficient algorithm which we name as Mimic-MD, and the lower bound is established by proving a two-way reduction between IL and the value estimation problem of the unknown expert policy under any given reward function, as well as linear functional estimation with subsampled observations. We further show that under the additional assumption that the expert is optimal for the true reward function, there exists an efficient algorithm, which we term as Mimic-Mixture, that provably achieves suboptimality independent of the horizon for arbitrary 3-state MDPs with rewards only at the terminal layer. In contrast, no algorithm can achieve suboptimality growing slower than the square root of the horizon with high probability if the expert is not constrained to be optimal. We formally establish the benefit of expert optimal assumption in the known transition setting and show that this additional assumption does not help when the transition functions are unknown.
About the speaker:Bio: Jiantao Jiao is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences and Department of Statistics at the University of California, Berkeley. He received his B.Eng. degree in Electronic Engineering from Tsinghua University, Beijing, China in 2012, and his M.Sc. and Ph.D. degrees in Electrical Engineering from Stanford University in 2014 and 2018, respectively. He is a recipient of the Presidential Award of Tsinghua University and the Stanford Graduate Fellowship. He was a semi-plenary speaker at ISIT 2015 and a co-recipient of the ISITA 2016 Student Paper Award and MobiHoc 2019 best paper award. His research interests are in statistical machine learning, high-dimensional and nonparametric statistics, mathematical programming, applied probability, information theory, and their applications.Watch Online |
Friday, February 19, 2021, 11:00AM
|
Building Reproducible, Reusable, and Robust Deep Reinforcement Learning SystemsJoelle Pineau [homepage]
We have seen amazing achievements with machine learning in recent years. Yet reproducing results for state-of-the-art deep learning methods is seldom straightforward. Results can vary significantly given minor perturbations in the task specification, data or experimental procedure. This is of major concern for anyone interested in using machine learning in real-world applications. In this talk, I will review challenges that arise in experimental techniques and reporting procedures in deep learning, with a particular focus on reinforcement learning and applications to healthcare. I will also describe several recent results and guidelines designed to make future results more reproducible, reusable and robust.
About the speaker:Joelle Pineau is the Managing Director of Facebook AI Research, where she oversees the Montreal, Seattle, Pittsburgh, and Menlo Park labs. She is also a faculty member at Mila and an Associate Professor and William Dawson Scholar at the School of Computer Science at McGill University, where she co-directs the Reasoning and Learning Lab. She holds a BASc in Engineering from the University of Waterloo, and an MSc and PhD in Robotics from Carnegie Mellon University. Dr. Pineau's research focuses on developing new models and algorithms for planning and learning in complex partially-observable domains.Watch Online |
Friday, April 9, 2021, 11:00AM
|
Compositional Generalizability in Geometry and Policy LearningHao Su [homepage]
It is well known that deep neural networks are universal function approximators and have good generalizability when the training and test datasets are sampled from the same distribution. Most deep learning-based applications and theories in the past decade are based upon this setup. While the view of learning function approximators has been rewarding to the community, we are seeing more and more of its limitations when dealing with the real-world problem space that is combinatorially exploded. In this talk, I will discuss a possible shift of view, from learning function approximators to learning algorithm approximators, by some preliminary work in my lab. Our ultimate goal is to achieve generalizability when learning in a problem space of combinatorial complexity. We refer to this desired generalizability as compositional generalizability. To this goal, we take important problems in geometry, physics, and policy learning as testbeds. Particularly, I will introduce how we build algorithms with state-of-the-art compositional generalizability on these testbeds, following a bottom-up principle and a modularized principle.
About the speaker:Hao Su is an Assistant Professor of Computer Science and Engineering in UC San Diego. He is interested in fundamental problems in broad disciplines related to artificial intelligence, including machine learning, computer vision, computer graphics, and robotics. His most recent work focuses on integrating the disciplines for building and training embodied AI that can interact with the physical world. In the past, his work of ShapeNet, PointNet series, and graph neural networks have significantly impacted the emergence and growth of a new field, 3D deep learning. He also used to participate in the development of ImageNet, a large-scale 2D image database. He has served as the Area Chair, Associated Editor, and other comparable positions in the program committee of CVPR, ICCV, ECCV, ICRA, Transactions on Graphics (TOG), and AAAI.Watch Online |