I. Categorizing and matching objects
II. Surrounding cues
a. Inferring 3d cues from a single image
c. Context
III. Data-driven visual learning
b. Text, language, and imagery
c. Unsupervised learning and discovery
IV. Searching and browsing visual content
b. Browsing: query refinement and summarization
c. Social networks and image tagging
Rapid Object Detection Using a Boosted Cascade of Simple Features, by P. Viola and M. Jones. CVPR 2001.
[pdf] [Face detection in OpenCV]
Histograms of Oriented Gradients for Human Detection, by N.Dalal, B.Triggs. CVPR 2005
[pdf] [demo video] [software] [PASCAL datasets]
Additional code / software:
Pyramid Histogram of Oriented Gradients (PHOG) code from Anna Bosch
Object Recognition from Local Scale-Invariant Features, by D. Lowe. ICCV 1999.
Local Invariant Feature Detectors: A Survey, by T. Tuytelaars and K. Mikolajczyk. Foundations and Trends in Computer Graphics and Vision, 2008.
Sampling Strategies for Bag-of-Features Image Classification. E. Nowak, F. Jurie, and B. Triggs. ECCV 2006.
[pdf]
Groups of Adjacent Contour Segments for Object Detection, by V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid. PAMI 2007.
Normalized Cuts for Image Segmentation, by J. Shi and J. Malik. CVPR 1997.
[pdf]
Shape Matching and Object Recognition Using Shape Contexts, by S. Belongie, J. Malik, and J. Puzicha. PAMI April 2002.
Oxford Interest Point Software Webpage
John Lee’s libpmk feature extraction code
Pyramid Histogram of Oriented Gradients (PHOG) code from Anna Bosch
Software from LEAR team at INRIA, including interest point detectors, shape features
Ivan Laptev’s software for space-time
interest points and histograms of oriented gradients (HOG) and histograms of
optical flow (HOF)
Berkeley Group boundary detection
code from David Martin
Graph-based segmentation code from
Pedro Felzenszwalb
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, by K. Grauman and T. Darrell. ICCV 2005.
[pdf] [web] [code] [Caltech101]
Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification, by A. Frome, Y. Singer, F. Sha, J. Malik. ICCV 2007.
[pdf]
Video Google: A Text Retrieval Approach to Object Matching in Videos, by J. Sivic and A. Zisserman, ICCV 2003.
Proximity Distribution Kernels for Geometric Context in Category Recognition, by H. Ling and S. Soatto. CVPR 2007.
[pdf] [PASCAL datasets] [Graz dataset]
LIBSVM: library for Support Vector Machines and tool for precomputed kernel matrices
Caltech-101 kernels and results from Anna Bosch and Andrew Zisserman
Object Class Recognition by Unsupervised Scale Invariant Learning, by R. Fergus, P. Perona, and A. Zisserman. CVPR 2003.
Combined Object Categorization and Segmentation with an Implicit Shape Model, by B. Leibe, A. Leonardis, and B. Schiele. ECCV Workshop on Statistical Learning in Computer Vision, 2004.
[pdf] [code] [video1] [video2]
A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb, D. McAllester and D. Ramanan. CVPR 2008.
Simple parts-and-structure detector (by Fergus/FeiFei/Torralba) http://people.csail.mit.edu/fergus/iccv2005/partsstructure.html
LabelMe: a Database and Web-based Tool for Image Annotation. B. Russell, A. Torralba,
K. Murphy, and W. Freeman, IJCV 2008.
Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu and M. Blum, CHI 2006.
[pdf]
GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, by C. Rother, V. Kolmogorov, A. Blake, SIGGRAPH 2004.
[pdf] [demo video]
Multi-Level Active Prediction of Useful Image Annotations for Recognition, by S. Vijayanarasimhan and K. Grauman, NIPS 2008.
[pdf]
Additional code / software / demos:
ESP Game and other games, Luis von Ahn et al.
CAPTCHA: Telling Humans and Computers Apart
Automatically
Geometric Context from a Single Image, by D. Hoiem, A. Efros, and M. Hebert, ICCV 2005.
Depth Estimation using Monocular and Stereo Cues, by A. Saxena, J. Schulte, and A. Ng. IJCAI 2007.
Inferring Spatial Layout from A Single Image via Depth-Ordered Grouping, by S. Yu and H. Zhang and J. Malik, Workshop on Perceptual Organization in Computer Vision, 2008. [pdf] [slides] [data]
Try Labelme’s 3d popup feature
Pedro Felzenszwalb’s segmentation code
Hoiem et al. Automatic photo pop-up
Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope, by A. Oliva and A. Torralba, IJCV 2001.
A Bayesian Hierarchical Model for Learning Natural Scene Categories, by L. Fei-Fei and P. Perona. CVPR 2005.
[pdf]
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, by S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006.
[pdf] [slides] [dataset] [libpmk_spatial] [Matlab code]
Additional code / software / data:
100 natural scenes from Fei-Fei et al.
13 natural scene categories dataset
David Blei’s topic modeling code
Learning Spatial Context: Using Stuff to Find Things, by G. Heitz and D. Koller, ECCV 2008.
[pdf]
Contextual Priming for Object Detection, by A. Torralba. IJCV, 2003.
Object Categorization using Co-Occurrence, Location and Appearance, by C.
Galleguillos, A. Rabinovich and S. Belongie, CVPR 2008.
[pdf]
Putting Objects in Perspective, by D. Hoiem, A. Efros, and M. Hebert, CVPR 2006.
Additional code / software / data:
Survey on context in recognition by Galleguillos et al.
Labelme dataset
IM2GPS: Estimating Geographic Information from a Single Image, by J. Hays and A. Efros. CVPR 2008.
80 Million Tiny Images: a Large Dataset for Non-Parametric Object and Scene Recognition. by A. Torralba, R. Fergus, and W. Freeman, PAMI 2008.
Scene Segmentation Using the Wisdom of Crowds, by I. Simon and S. Seitz. ECCV 2008.
[pdf]
Harvesting Image Databases from the Web, by F. Schroff, A. Criminisi, and A. Zisserman, ICCV 2007.
[pdf]
World-scale Mining of Objects and Events from Community Photo Collections, by T. Quack, B. Leibe, and L. Van Gool, CIVR 2008.
[pdf]
Additional code / data / papers / demos:
Tamara Berg’s Animals on the Web data
Florian Schroff’s page on Harvesting Image Databases from the web
Rob Fergus’s dataset for Learning Object Categories from Google’s Image Search
Code for finding and downloading images on Flickr, by James Hays
Creating and Exploring a Large Photorealistic Virtual Space, Sivic et al.
Semantic Robot Vision Challenge
“Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, BMVC 2006.
Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, CVPR 2004.
Movie/Script: Alignment and Parsing of Video and Text Transcription, by T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar, ECCV 2008.
Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers, A. Gupta and L. Davis, ECCV 2008.
[pdf]
Subrip for subtitle extraction
Sonal Gupta’s dataset of srports videos with commentary
Face data from Buffy episode, from Oxford Visual Geometry Group
Discovering Objects and Their Location in Images, by J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman, ICCV 2005.
Unsupervised Discovery of Action Classes, by Y. Wang, H. Jiang, M. Drew, Z-N. Li and G. Mori, CVPR 2006.
Detecting Irregularities in Images and in Video, by O. Boiman, M. Irani, ICCV 2005.
Scalable Recognition with a Vocabulary Tree, by D. Nister and H. Stewenius, CVPR 2006.
[pdf]
Fast Image Search for Learned Metrics. P. Jain, B. Kulis, and K. Grauman, CVPR 2008.
Efficient Near-Duplicate Detection and Sub-Image Retrieval. Y. Ke, R. Sukthankar, and L. Huston. Multimedia 2004. [pdf]
Additional code / data / references:
Oxford project on object retrieval with vocabulary trees
LSH Matlab code by Greg Shakhnarovich
Nearest neighbor datasets from Vassilis Athitsos
Electronic copy of the book Nearest Neighbor Methods in Learning and Vision: Theory and Practice (UT EID required)
Small Codes and Large Image Databases for Recognition, by Torralba, A. , Fergus, R. and Weiss, Y. CVPR 2008.
Object Retrieval with Large Vocabularies and Fast Spatial Matching. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007. [pdf]
Nonchronological Video Synopsis and Indexing, by Y. Pritch, A. Rav-Acha, and S. Peleg, TPAMI 2008.
CuZero: Embracing the Frontier of Interactive Visual Search for Informed Users, by E. Zavesky and S-F. Chang, MIR 2008.
[pdf]
Photo Tourism: Exploring Photo Collections in 3D, by N. Snavely, S. Seitz, and R. Szeliski, SIGGRAPH 2006.
Graph-Cut Transducers for Relevance Feedback in Content Based Image Retrieval, by H. Sahbi, J-Y. Audibert, R. Keriven, ICCV 2007. [pdf]
Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs, by X. Li, C. Wu, C. Zach, S. Lazebnik, and J. Frahm, ECCV 2008. [pdf] [web]
Additional references / demos / data:
UW Community Photo Collections Webpage
Survey by Xiang Zhou and Thomas Huang on relevance feedback for CBIR, 2001
Baeza-Yates & Ribeiro-Neto Chapter 5 on query operations
Autotagging Facebook: Social Network Context Improves Photo Annotation, by Z. Stone, T. Zickler, and T. Darrell. Internet Vision Workshop 2007.
[pdf]
Learning Tag Relevance by Neighbor Voting for Social Image Retrieval, by X. Li, C. Snoek, and M. Worring. MIR 2008.
[pdf]
Why We Tag: Motivations for Annotation in Mobile and Online Media, by M. Ames and M. Naaman, CHI 2007.
[pdf]