Words are basic tokens in a document of text: they allow us to index documents with a keyword search, or discover topics based on common distributions of words. What is the analogy for an image? Visual words are prototypical local features that form a “vocabulary” to generate images. As with documents, they can be a useful representation. Various recognition approaches exploit a bag-of-visual-words feature space, identifying the vocabulary words based on some quantization of a sample of local descriptors. These papers address questions surrounding vocabulary formation, including interest point selection, quantization strategies, and maintaining efficient codebooks.

*Sampling Strategies for Bag-of-Features Image Classification. E. Nowak, F. Jurie, and B. Triggs. In Proceedings of the European Conference on Computer Vision (ECCV), 2006. [pdf]

Visual Categorization with Bags of Keypoints, by G. Csurka, C. Bray, C. Dance, and L. Fan. In Workshop on Statistical Learning in Computer Vision, ECCV, 2004. [pdf]

Adapted Vocabularies for Generic Visual Categorization, by F. Perronnin, C. Dance, G. Csurka, M. Bressan, in Proceedings of the European Conference on Computer Vision (ECCV), 2006. [pdf]

*Fast Discriminative Visual Codebooks using Randomized Clustering Forests, by A. Moosmann, B. Triggs and F. Jurie. Neural Information Processing Systems (NIPS), 2006. [pdf]

Object Categorization by Learned Universal Visual Dictionary. J. Winn, A. Criminisi and T. Minka. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2005. [pdf]

Vector Quantizing Feature Space with a Regular Lattice, by T. Tuytelaars and C. Schmid, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

*Scalable Recognition with a Vocabulary Tree, by D. Nister and H. Stewenius, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. [pdf]

Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning, by T. Yeh, J. Lee, and T. Darrell. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf] [web]

Benchmark database from University of Kentucky, used in vocab tree., plus the semiprocessed data.

Libpmk, library from John Lee that includes hierarchical clustering / vocab

Software from LEAR team at INRIA, including interest point detectors, shape features, randomized forest image classifier

Mining image collections

Mining large unstructured collections of images can identify common visual patterns and allow the discovery of topics or even categories. These papers include methods for clustering according to latent topics and repeated configurations of features, mining for association rules, and playing with large image collections.

Video Data Mining Using Configurations of Viewpoint Invariant Regions, by Sivic, J. and Zisserman, A. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004. [pdf]

Efficient Mining of Frequent and Distinctive Feature Configurations, by T. Quack, V. Ferrari, B. Leibe, and L. Van Gool, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

Mining Association Rules Between Sets of Items in Large Databases, by R. Agrawal, T. Imielinski, and A. N. Swami. In Special Interest Group on Management of Data (SIGMOD), 1993. [pdf]

Discovering Objects and Their Location in Images, by J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2005. [pdf] [web]

Non-metric Affinity Propagation for Unsupervised Image Categorization, by D. Dueck and B. Frey, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf] [web]
80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition, by A. Torralba, R. Fergus and W. Freeman, MIT CSAIL Tech report, 2007. [pdf] [web]

Mining Image Datasets using Perceptual Association Rules, by J. Tesic, S. Newsam, and B. S. Manjunath. In SIAM’03 Workshop on Mining Scientific and Engineering Datasets, 2003. [pdf]

Related links

pLSA implementations

Matlab code and data for affinity propagation, from Dueck & Frey

Weka: Java data mining software, includes implementatin of Apriori algorithm

Fast indexing methods

Content-based image and video retrieval, as well as example-based recognition systems, require the ability to rapidly search very large image collections. This area deals with algorithms for fast search, specifically in the context of indexing images or image features.

Scalable Recognition with a Vocabulary Tree, by D. Nister and H. Stewenius, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. [pdf]

*A Binning Scheme for Fast Hard Drive Based Image Search, F. Fraundorfer, H. Stewenius, and D. Nister, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007. [pdf]

*Fast Pose Estimation with Parameter Sensitive Hashing, by G. Shakhnarovich, P. Viola, T. Darrell, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2003. [pdf]

Video Google: A Text Retrieval Approach to Object Matching in Videos, by J. Sivic and A. Zisserman, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2003. [pdf] [web]

Fast Similarity Search for Learned Metrics. P. Jain, B. Kulis, and K. Grauman. UTCS Technical Report #TR-07-48, September, 2007.

*Learning Embeddings for Fast Approximate Nearest Neighbor Retrieval. V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios, Nearest-Neighbor Methods in Learning and Vision: Theory and Practice, G. Shakhnarovich, T. Darrell and P. Indyk, Editors. MIT Press, March 2006. [ps]

LSH Matlab code by Greg Shakhnarovich

Nearest neighbor datasets from Vassilis Athitsos

Electronic copy of the book Nearest Neighbor Methods in Learning and Vision: Theory and Practice (UT EID required)

Faces

These papers consider the problems of detecting faces, recognizing familiar faces, and looking for repeated faces in videos. A variety of techniques are represented below.

Face Recognition: A Literature Survey, by W. Zhao, R. Chellappa, A. Rosenfeld, and P. Phillips. In ACM Computing Surveys, 2003. [pdf]

*Rapid Object Detection Using a Boosted Cascade of Simple Features, by P. Viola and M. Jones, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2001. [pdf]

Eigenfaces for Recognition, by M. Turk and A. Pentland, Journal of Cognitive Neuroscience, Vol. 3, No. 1, pp. 71-86, Winter 1991. [pdf]

Neural Network-Based Face Detection, by H. Rowley, S. Baluja, and T. Kanade. IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 20, number 1, pages 23-38, January 1998. [pdf] [web]

o Active Appearance Models, by T.F.Cootes, G.J. Edwards and C.J.Taylor. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol.23, No.6, pp.681-685, 2001.

*Automatic Cast Listing in Feature-Length Films with Anisotropic Manifold Space, by Arandjelovic and R. Cipolla, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. [pdf]

Person Spotting: Video Shot Retrieval for Face Sets, J. Sivic, M. Everingham, and A. Zisserman. In International Conference on Image and Video Retrieval (CIVR), 2005. [pdf]

Leveraging Archival Video for Building Face Datasets, D. Ramanan, S. Baker, S. Kakade. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

Face Recognition by Humans: 19 Results All Computer Vision Researchers Should Know About, by P. Sinha, B. Balas, Y. Ostrovsky, and R. Russell, Proceedings of the IEEE, Vol. 94, No. 11, November 2006, pp. 1948-1962. [pdf]

Active Appearance Models code from Tim Cootes

Data collections of detected faces, from Oxford VGG

Face data from Buffy episode, from Oxford VGG

University of Cambridge face data from films [go to Data link]

PolarRose.com

Pittsburgh Pattern Recognition face detector demo

Datasets and dataset creation

These papers discuss issues in generating image datasets for recognition research. Benchmark image datasets allow direct comparisons between various recognition algorithms, and having accessible prepared datasets can be critical for the research itself. The process of designing an image collection is also important, since the degree of variability can to some degree influence the assumptions made by new methods, or may not adequately show-off their strengths. Meanwhile, the process of collecting labeled data is expensive and can be tedious. These papers include novel ways to gather image collections with less pain, and highlight some of the considerations to be made in database design. *Coverage of this area should include highlights on recent commonly used datasets.*

Dataset Issues in Object Recognition. by J. Ponce, T.L. Berg, M. Everingham, D.A. Forsyth, M. Hebert, S. Lazebnik, M. Marszalek, C. Schmid, B.C. Russell, A. Torralba, C.K.I. Williams, J. Zhang, and A. Zisserman. In J. Ponce et al. (Eds.): Toward Category-Level Object Recognition, LNCS 4170, pp. 29–48, 2006. [pdf]

Labeling Images with a Computer Game, by L. von Ahn and L. Dabbish, in ACM Special Interest Group on Computer-Human Interaction (CHI), 2004. [pdf] [www.captchas.net]

Peekaboom: A Game for Locating Objects in Images, by L. von Ahn, R. Liu, and M. Blum. In In ACM Special Interest Group on Computer-Human Interaction (CHI), 2006. [pdf]

Soylent Grid: it’s Made of People! by S. Steinbach, V. Rabaud and S. Belongie, ICCV workshop on Interactive Computer Vision, 2007. [pdf]

Harvesting Image Databases from the Web, by F. Schroff, A. Criminisi, and A. Zisserman, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

[No demo on this topic.]

Related links

Dataset list with links

Near-duplicate detection

This problem involves detecting cases where multiple images (or videos) are the same except for some slight alterations. Near-duplicate detection can be useful for detecting copyright violations or forged images. These papers include several vision approaches, as well as some papers on the core algorithms often used.

Efficient Near-Duplicate Detection and Subimage Retrieval, by Yan Ke, Rahul Sukthankar, and Larry Huston, ACM Multimedia 2004. [pdf]

Enhancing DPF for Near-replica Image Recognition, by Y. Meng, E. Chang, and B. Li, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2003. [pdf]

Content-based Copy Detection using Distortion-Based Probabilistic Similarity Search, by A. Joly, O. Buisson, and C. Frélicot. In IEEE Transactions on Multimedia, 2007. [pdf]

Filtering Image Spam with Near-Duplicate Detection, by Zhe Wang, W. Josephson, Q. Lv, M. Charikar, and K. Li. Proceedings of the 4th Conference on Email and Anti-Spam (CEAS), 2007. [pdf]

M. Henzinger. Finding Near-Duplicate Web Pages: a Large-Scale Evaluation of Algorithms. In ACM Special Interest Group on Information Retrieval (SIGIR), 2006. (text application) [pdf]

On the Resemblance and Containment of Documents, Andrei Z. Broder, 1997. [pdf]

Similarity Estimation Techniques from Rounding Algorithms, M. S. Charikar. In 34th Annual ACMSymposium on Theory of Computing (May 2002). [ps]

Scalable Near Identical Image and Shot Detection, by O. Chum, J. Philbin, M. Isard, and A. Zisserman, ACM International Conference on Image and Video Retrieval, 2007. [pdf]

Related links:

Data from Ke et al. paper

LSH homepage, email authors for code package

LSH Matlab code by Greg Shakhnarovich

TRECVID data

Learning distance functions

The success of any distance-based indexing, clustering, or classification scheme depends critically on the quality of the chosen distance metric, and the extent to which it accurately reflects the true underlying relationships between the examples in a particular data domain. An optimal distance metric should report small distances for examples that are similar in the parameter space of interest (or that share a class label), and large distances for examples that are unrelated. These papers consider distance learning specifically for image retrieval tasks.

Learning Distance Functions for Image Retrieval, by T. Hertz, A. Bar-Hillel and D. Weinshall, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2004. [pdf]

Learning a Mahalanobis Metric from Equivalence Constraints, by A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall, in Journal of Machine Learning Research (JMLR), 2005. [pdf]

*Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification, by A. Frome, Y. Singer, F. Sha, J. Malik, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf] [web]

*Invariant Large Margin Nearest Neighbor Classifier, by P. Mudigonda, P. Torr, and A. Zisserman, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

Fast Pose Estimation with Parameter Sensitive Hashing, by G. Shakhnarovich, P. Viola, and T. Darrell, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2003. [pdf]

Distance Metric Learning: A Comprehensive Survey, by L. Yang, Michigan State University, 2006. [part1] [part2]

Related links:

DistBoost code, Hertz et al.

Relevant Components Analysis code, Hertz et al.

DistLearn toolkit

Large Margin Nearest Neighbors code by Weinberger et al.

Nearest neighbor datasets from Vassilis Athitsos

Place recognition and kidnapped robots

How can an image of the current scene allow localization or place recognition? Or, put more dramatically, how can a kidnapped robot that is carried off to an arbitrary location figure out where it is with no prior knowledge of its position? These papers address this problem, some specifically with a robotics slant, and some in terms of the image-based scene matching problem.

*Vision-Based Global Localization and Mapping for Mobile Robots, Se, S., Lowe, D., & Little, J. IEEE Transactions on Robotics, 2005. [pdf]

Image-Based Localisation, by R. Cipolla, D. Robertson and B. Tordoff. Proceedings of the10th International Conference on Virtual Systems and Multimedia, 2004. [pdf]

*Qualitative Image Based Localization in Indoors Environments, by J. Kosecka, L. Zhou, P. Barber, and Z. Duric, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2003. [pdf]

Location Recognition and Global Localization Based on Scale-Invariant Keypoints, by J. Kosecka and X. Yang, CVPR workshop 2004. [pdf]

Searching the Web with Mobile Images for Location Recognition, T. Yeh, K. Tollmar, and T. Darrell, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004. [pdf]

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval, by O. Chum, J. Philbin, J. Sivic, M. Isard, A. Zisserman, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

Topological Map Learning from Outdoor Image Sequences, by X. He, R. Zemel, and V. Mnih. Journal of Field Robotics, 2007 [pdf]

Related links:

Oxford buildings dataset

Text and speech + images and video

Often images or videos are accompanied by text or speech, which may provide complementary cues when we are trying to index, cluster, or recognize objects. These papers seek to leverage this cue in a number of different ways.

*“Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video, by M. Everingham, J. Sivic and A. Zisserman, British Machine Vision Conference (BMVC), 2006. [pdf]

*Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth, in Proceedings of the European Conference on Computer Vision (ECCV), 2002. [pdf] [web]

Names and Faces in the News, by T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004. [pdf] [web]

Learning Structured Appearance Models from Captioned Images of Cluttered Scenes, by M. Jamieson A. Fazly, S. Dickinson, S. Stevenson, S. Wachsmuth. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

Clustering Web Images with Multi-modal Features, by M. Rege, M. Dong, and J. Hua, ACM Multimedia 2007. [pdf]

Learning Visual Representations using Images with Captions, by A. Quattoni, M. Collins, T. Darrell. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007. [pdf]

Data from Duygulu et al. paper

Subrip for subtitle extraction

Context and background knowledge in recognition

Many recognition systems consider snapshots of objects in isolation, both when training and testing. But both our intuition and cognitive studies indicate that the object’s greater context can also be crucial to the recognition process. These papers consider how prior external knowledge can aid in recognizing objects or categories. The context cues may come from reasoning explicitly about the 3d environment, knowing something about the patterns of a user, learning about the typical patterns of occurrence, or gleaning knowledge from an organized ontology.

*Putting Objects in Perspective, by D. Hoiem, A.A. Efros, and M. Hebert, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. [pdf] [web]

Objects in Context, by A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, S. Belongie, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

Visual Contextual Awareness in Wearable Computing, by T. Starner, B. Schiele, and A. Pentland. In Proceedings of Visual Contextual Awareness in Wearable Computing, 1998. [pdf] [web]

*Contextual Priming for Object Detection, by A. Torralba. International Journal of Computer Vision, 2003. [pdf] [web] [web]

The Role of Context in Object Recognition, by A. Oliva and A. Torralba. TRENDS in Cognitive Sciences, Vol 11 No 12, 2007. [pdf]

Object Boundary Detection in Images using a Semantic Ontology, by A. Hoogs, R. Collins, in Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), 2006.

Semantic Hierarchies for Visual Object Recognition, by M. Marszałek and C. Schmid, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007. [pdf]

Unsupervised Learning of Hierarchical Semantics of Objects, by D. Parikh and T. Chen, in Proceedings of the International Conference on Computer Vision (ICCV), 2007. [pdf] [web]

Related links:

WordNet

Scene global feature code from Antonio Torralba

MIT CSAIL database of objects and scenes

Learning about images from keyword-based Web search

Keyword-based search on the Web can be used to retrieve images (or videos) that appear near the query word, are named with the word, or are explicitly tagged with it. Of course, this is not a completely reliable way to find images of a given object or scene, and typically an image contains much more information than can be conveyed in a few words anyhow. Yet search engines’ rapid access to large amounts of image/video content make them an interesting resource for vision research. These papers all consider ways to learn from the images that come back from a keyword-based search, taking into account the large amount of noise in the returns.

*Learning Color Names from Real-World Images, by J. van de Weijer, C. Schmid, J. Verbeek, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

Searching the Web with Mobile Images for Location Recognition, T. Yeh, K. Tollmar, and T. Darrell, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004. [pdf]

*Learning Object Categories from Google’s Image Search, by R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2005. [pdf] [web]

Animals on the Web, by T. Berg and D. Forsyth, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. [pdf]

Keywords to Visual Categories: Multiple-Instance Learning for Weakly Supervised Object Categorization, by S. Vijayanarasimhan and K. Grauman, UTCS Tech report, 2007. [pdf]

Harvesting Image Databases from the Web, by F. Schroff, A. Criminisi, and A. Zisserman, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf]

Probabilistic Web Image Gathering, by K. Yanai and K. Barnard, in ACM Multimedia 2005. [pdf]

Annotated Google image data from Schroff et al. paper

Color name datasets from van de Weijer et al. and feature code

Google image data from Fergus et al.

Flickr Commons, Library of Congress pilot project

Semantic robot vision challenge and example data

Video summarization

How can a video be compactly presented in a visual way? Video summarization methods attempt to abstract the main occurrences, scenes, or objects in a clip in order to provide an easily interpreted synopsis.

Webcam Synopsis: Peeking Around the World, by Y. Pritch, A. Rav-Acha, A. Gutman, and S. Peleg, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf] [web]

Making a Long Video Short: Dynamic Video Synopsis, by A. Rav-Acha, Y. Pritch, and S. Peleg. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. [pdf]

Video Abstraction, by J. Oh, Q. Wen, J. lee, and S. Hwang. In S. Deb, editor, Video Data Mangement and Information Retrieval, Idea Group Inc. and IRM Press, 2004. [pdf]

Shapetime Photography, by W. T Freeman and H. Zhang, in Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2003. [pdf]

Video Summaries through Mosaic-Based Shot and Scene Clustering, A. Aner and J. Kender, in Proceedings of the European Conference on Computer Vision (ECCV), 2002. [pdf]

Dynamic Stills and Clip Trailers, by Y. Caspi, A. Axelrod, Y. Matsushita, A. Gamliel. [pdf] [web]

Reliable Transition Detection in Videos: A Survey and Practitioner’s Guide, by R. Lienhart, International Journal of Image and Graphics, 2001. [pdf]

Recent Advances in Content-based Video Analysis. C Ngo, H. Zhang, and T. Pong. International Journal of Image and Graphics, 2001. [pdf]

Image and video retargeting

These papers cover both content-aware resizing as well as texture synthesis. The general idea is to automate (or semi-automate) the process of adapting image or video inputs to a desired format, whether that’s so it can be viewed well on a different display size, or so it can be viewed continuously as if the regular spatial or temporal pattern persists beyond where it ends in the raw input. The challenges include adapting the input in such a way that the most interesting parts are preserved or well-represented, and in the case of textures, generating processes that look realistically stochastic and “natural”.

*Seam Carving for Content-Aware Image Resizing. by S. Avidan and A. Shamir, in ACM Transactions on Graphics (SIGGRAPH), 2007. [pdf] [web]

o Image Quilting for Texture Synthesis and Transfer, by A. Efros and W. Freeman, in ACM Transactions on Graphics (SIGGRAPH), 2001. [pdf] [web]

Fast Texture Synthesis using Tree-structured Vector Quantization, by L.Wei and M. Levoy, in ACM Transactions on Graphics (SIGGRAPH), 2000. [pdf] [web]

Video Textures, by A. Schodl, R. Szeliski, D. Salesin, and I. Essa. ACM Transactions on Graphics (SIGGRAPH), 2000. [pdf] [web]

*Graphcut Textures: Image and Video Synthesis Using Graph Cuts, by V. Kwatra , A. Schödl , I. Essa , G. Turk and A. Bobick, in ACM Transactions on Graphics (SIGGRAPH), 2003. [pdf] [web]

Automatic Thumbnail Cropping and its Effectiveness, by B. Suh, H. Ling, B. Bederson, and D. Jacobs. In Proceedings of the Symposium On User interface Software and Technology, 2003. [pdf]

*Non-homogeneous Content-driven Video-retargeting, by L. Wolf, M. Guttmann, D. Cohen-Or: In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007. [pdf] [web]

Video Retargeting: Automating Pan and Scan, by F. Liu and M. Gleicher, in ACM Multimedia, 2006. [pdf]

Exploring images in 3D

From multiple views of a scene we can create 3D representations or new renderings. These papers propose ways to explore image content in 3D, with an emphasis on applications of doing so, such as perusing popular tourist sites from multiple users’ photos, analyzing the geometry of paintings, or editing photos based on their layers. Some methods included here are semi-automatic.

*Photo Tourism: Exploring Photo Collections in 3D, N. Snavely, S. Seitz, and R. Szeliski, ACM Transactions on Graphics (SIGGRAPH), 2006. [pdf] [web]

Tour into the Picture: Using a Spidery Mesh Interface to Make Animation from a Single Image, by Y. Horry, K. Anjyo, and K. Arai. ACM Transactions on Graphics (SIGGRAPH), 1997. [pdf]

Automatic Photo-Popup, by D. Hoiem, A. Efros, and M. Hebert, ACM Transactions on Graphics (SIGGRAPH), 2005. [pdf] [web]

*Single-View Metrology: Algorithms and Applications, by A. Criminisi, DAGM, 2002. [pdf] [web]

Single View Metrology, A. Criminisi, I. Reid, A. Zisserman, International Journal of Computer Vision, 1999. [pdf]

Image-Based Modeling and Photo Editing, by B. Oh, M. Chen, J. Dorsey, and F. Durand, ACM Transactions on Graphics (SIGGRAPH), 2001. [pdf]

Canonical views and visualization

Given an object or scene, what sparse set of viewpoints best summarize it? This problem is in some ways related to the video summarization topic (see above), but here with an emphasis on the visualization of photo collections, and with some consideration of optimizing for human perception.

Scene Summarization for Online Image Collections, by I. Simon, N. Snavely, and S. Seitz. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007 [pdf] [web]

Generating Summaries for Large Collections of Geo-referenced Photographs, by A. Jaffe, M. Naaman, T. Tassa, and M. Davis. International Conference on World Wide Web, 2006. [pdf]

Approximation of Canonical Sets and Their Applications to 2D View Simplification, by T. Denton, J. Abrahamson, A. Shokoufandeh, in Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2004. [pdf]

What Object Attributes Determine Canonical Views? V. Blanz, M. Tarr, H. Bulthoff. Perception, 28(5):575-600, 1999. [pdf]

Digital Tapestry, by C. Rother, S. Kumar, V. Kolmogorov, and A. Blake, in Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2005. [pdf] [web]

Picture Collage, by J. Wang, J. Sun, L. Quan, X. Tang, and H. Shum, in Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2006. [pdf]

Shape matching

The shape matching problem considers how to compare shapes, often as defined in terms of their contours, silhouettes, or sampled edge points. These papers provide different matching metrics and demonstrate the use of shape for applications like object recognition, reading warped text, detecting pedestrians, and categorization.

*Shape Matching and Object Recognition Using Shape Contexts, by S. Belongie, J. Malik, and J. Puzicha. Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2002. [pdf] [web]

Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA, by G. Mori and J. Malik, in Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2003. [pdf] [web]

Using the Inner-Distance for Classification of Articulated Shapes, by H. Ling and D. Jacobs, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005. [pdf]

Comparing Images Using the Hausdorff Distance, by D. Huttenlocher, G. Klanderman, and W. Rucklidge, Transactions on Pattern Analysis and Machine Intelligence (PAMI), 1993. [pdf]

Pedestrian Detection from a Moving Vehicle, by D. Gavrila, Proceedings of the European Conference on Computer Vision (ECCV), 2000. [pdf]

*A Boundary-Fragment-Model for Object Detection, by A. Opelt, A. Pinz, and A. Zisserman, Proceedings of the European Conference on Computer Vision (ECCV), 2006. [pdf]

Hierarchical Matching of Deformable Shapes, by P. Felzenszwalb and J. Schwartz, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007. [pdf]

MNIST handwritten digits database

Detecting abnormal events

It would be useful if a vision system that monitors video could automatically determine when something “unusual” is happening. But how can a system be trained to recognize something it has never (or rarely) seen before? These techniques address the problem of detecting visual anomalies.

*Detecting Irregularities in Images and in Video, by Boiman, O. and Irani, M. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2005. [pdf] [web]

A Principled Approach to Detecting Surprising Events in Video, in L. Itti and P. Baldi, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005. [pdf] [web]

*Detecting Unusual Activity in Video, by Zhong, H., Shi, J., and Visontai, M., in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004. [pdf]

Online Video Behaviour Abnormality Detection Using Reliability Measure, by Xiang, T., Gong, S., in British Machine Vision Conference (BMVC), 2005. [pdf]

Detecting Rare Events in Video using Semantic Primitives with HMM, by Chan, M. Hoogs, A., Schmiederer, J., Petersen, M. In Proceedings of the International Conference on Pattern Recognition (ICPR), 2004. [pdf]