CS 395T:
Grounded Natural Language Processing

How to read research articles (background papers recommended by Prof. Matt Lease)

  1. S. Keshav. How to Read a Paper. U. Waterloo, February 17, 2016.
  2. Alan Smith. 1990. The Task of the Referee.

Research Papers

Papers to be read and presented by students. A presentation date is given at the beginning for each paper.
  1. [1/29] Harnad, S., The Symbol Grounding Problem Physica D 42: 335-346, 1990.
  2. [1/29] Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, and Joseph Turian. Experience Grounds Language. EMNLP 2020.
  3. [1/31] Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sunderhauf, Ian Reid, Stephen Gould, Anton van den Hengel. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3674-3683
  4. [1/31] Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox, ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks, CVPR 2020.
  5. [2/5] Md. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, Hamid Laga, A Comprehensive Survey of Deep Learning for Image Captioning, ACM Computing Surveys (October 2018).
  6. [2/5] Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, and William Yang Wang, VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research , Proceedings of the 17th CVF/IEEE International Conference on Computer Vision (ICCV 2019), Seoul, Korea.
  7. [2/7] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question Answering, International Conference on Computer Vision (ICCV), 2015.
  8. [2/7] Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein, Learning to Compose Neural Networks for Question Answering, NAACL 2016.
  9. [2/12] Jialin Wu, Raymond Mooney, Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering, EMNLP, 2022.
  10. [2/12] Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal, TVQA+: Spatio-Temporal Grounding for Video Question Answering, ACL 2020.
  11. [2/14] Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, NeurIPS, 2019.
  12. [2/14] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever, Learning Transferable Visual Models From Natural Language Supervision, 2021.
  13. [2/19] Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi. MERLOT: Multimodal Neural Script Knowledge Models, NeurIPS 2021.
  14. [2/19] Xi Chen et al., PALI: A JOINTLY-SCALED MULTILINGUAL LANGUAGE-IMAGE MODEL, ICLR 2023.
  15. [2/21] Tristan Thrush, Ryan Jiang, Max Bartolo, Amanpreet Singh, Adina Williams, Douwe Kiela, Candace Ross, Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality, CVPR 2022.
  16. [2/21] Sivan Doveh, Assaf Arbelle, Sivan Harary, Roei Herzig, Donghyun Kim, Paola Cascante-bonilla, Amit Alfassy, Rameswar Panda, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky, Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models , NeurIPS 2023 Spotlight.
  17. [2/26] Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen Vong, Robert Hawkins, Yoav Artzi, Abstract Visual Reasoning with Tangram Shapes , EMNLP 2022 (Best Paper).
  18. [2/26] Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, Chelsea Finn, BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning , CoRL 2021.
  19. [2/28] Jesse Thomason and Jivko Sinapov and Maxwell Svetlik and Peter Stone and Raymond J. Mooney, Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy", In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI-16), 3477--3483, New York City, 2016.
  20. [2/28] Mohit Shridhar, Lucas Manuelli, Dieter Fox, CLIPort: What and Where Pathways for Robotic Manipulation, CoRL 2021
  21. [3/4] Michael Ahn, et al., Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, 2022.
  22. [3/4] Danny Driess et al., PaLM-E: An Embodied Multimodal Language Model, 2023.
  23. [3/6] Anthony Brohan et al., RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, 2023.
  24. [3/6] William Shen, Ge Yang, Alan Yu1, Jansen Wong, Leslie Kaelbling, Phillip Isola, Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation, CoRL 2023.
  25. [3/18] David Harwath, Adria Recasens, Didac Suris, Galen Chuang, Antonio Torralba, and James Glass, Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input, Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  26. [3/18] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi, Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, arxiv, May 2022.
  27. [3/20] Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David Fleet, Tim Salimans, IMAGEN VIDEO: HIGH DEFINITION VIDEO GENERATION WITH DIFFUSION MODELS, 2022.
  28. [3/20] Angel Chang, Will Monroe, Manolis Savva, Christopher Potts, Christopher D. Manning,Text to 3D Scene Generation with Rich Lexical Grounding, ACL 2015.
  29. [3/25] Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, Tao Chen, MotionGPT: Human Motion as a Foreign Language, 2023.
  30. [3/25] Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin, Magic3D: High-Resolution Text-to-3D Content Creation, CVPR 2023.

Guest Lectures

Class Project Presentations

  1. TBD