Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy

Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy.
Jesse Thomason, Jivko Sinapov, Maxwell Svetlik, Peter Stone, and Raymond Mooney.
In Proceedings of the 25th international joint conference on Artificial Intelligence (IJCAI), July 2016.
Demo Video

Download

[PDF]3.1MB [slides.pdf]1.0MB

Abstract

Grounded language learning bridges words like‘red’ and ‘square’ with robot perception. The vastmajority of existing work in this space limits robotperception to vision. In this paper, we build per-ceptual models that use haptic, auditory, and pro-prioceptive data acquired through robot exploratorybehaviors to go beyond vision. Our system learnsto ground natural language words describing ob-jects using supervision from an interactive human-robot “I Spy” game. In this game, the human androbot take turns describing one object among sev-eral, then trying to guess which object the otherhas described. All supervision labels were gath-ered from human participants physically presentto play this game with a robot. We demonstratethat our multi-modal system for grounding natu-ral language outperforms a traditional, vision-onlygrounding framework by comparing the two on the“I Spy” task. We also provide a qualitative analysisof the groundings learned in the game, visualizingwhat words are understood better with multi-modalsensory information as well as identifying learnedword meanings that correlate with physical objectproperties (e.g. ‘small’ negatively correlates withobject weight)

BibTeX Entry

@InProceedings{IJCAI16-thomason,
  title={Learning Multi-Modal Grounded Linguistic Semantics by Playing {I Spy}},
  author={Jesse Thomason and Jivko Sinapov and Maxwell Svetlik and Peter Stone and Raymond Mooney},
  booktitle={Proceedings of the 25th international joint conference on Artificial Intelligence (IJCAI)},
  location = {New York City, USA},
  month = {July},
  year = {2016},
  abstract = { Grounded  language  learning  bridges  words  like
â€˜redâ€™ and â€˜squareâ€™ with robot perception.  The vast
majority of existing work in this space limits robot
perception to vision.   In this paper,  we build per-
ceptual models that use haptic, auditory, and pro-
prioceptive data acquired through robot exploratory
behaviors to go beyond vision.  Our system learns
to  ground  natural  language  words  describing  ob-
jects using supervision from an interactive human-
robot â€œI Spyâ€ game.  In this game, the human and
robot take turns describing one object among sev-
eral,  then  trying  to  guess  which  object  the  other
has  described.   All  supervision  labels  were  gath-
ered  from  human  participants  physically  present
to  play  this  game  with  a  robot.   We  demonstrate
that  our  multi-modal  system  for  grounding  natu-
ral language outperforms a traditional, vision-only
grounding framework by comparing the two on the
â€œI Spyâ€ task. We also provide a qualitative analysis
of the groundings learned in the game, visualizing
what words are understood better with multi-modal
sensory information as well as identifying learned
word meanings that correlate with physical object
properties (e.g.  â€˜smallâ€™ negatively correlates with
object weight) 
   },
  wwwnote={<a href="https://youtu.be/jLHzRXPCi_w"> Demo Video</a>},
}

Generated by bib2html.pl (written by Patrick Riley ) on Fri Jun 20, 2025 08:27:21