CS395T: Visual Recognition and
Search
Spring 2008
 
 
Topics
 
 
Visual vocabularies
 
Mining image
collections
 
Fast indexing
methods
 
Faces
 
Datasets and
dataset creation
 
Near-duplicate
detection
 
Learning distance
functions
 
Place recognition
and kidnapped robots
 
Text/speech and
images/video
 
Context and
background knowledge in recognition
 
Learning about
images from keyword-based Web search
 
Video summarization
 
Image and video
retargeting
 
Exploring images in
3D
 
Canonical views and
visualization
 
Shape matching
 
Detecting
abnormal events
 
 
 
 
Visual vocabularies
 
Words are basic tokens in a document of text: they
allow us to index documents with a keyword search, or discover topics based on
common distributions of words.  What is
the analogy for an image?  Visual words are
prototypical local features that form a “vocabulary” to generate images.  As with documents, they can be a useful
representation.  Various recognition
approaches exploit a bag-of-visual-words feature space, identifying the
vocabulary words based on some quantization of a sample of local
descriptors.  These papers address
questions surrounding vocabulary formation, including interest point selection,
quantization strategies, and maintaining efficient codebooks.
 
 
 - *Sampling Strategies for Bag-of-Features Image
     Classification.  E. Nowak, F. Jurie,
     and B. Triggs.  In Proceedings of
     the European Conference on Computer Vision (ECCV), 2006.  [pdf]
 
 
 - Visual Categorization with Bags of Keypoints, by
     G. Csurka, C. Bray, C. Dance, and L. Fan. 
     In Workshop on Statistical Learning in Computer Vision, ECCV,
     2004.  [pdf]
 
 
 - Adapted Vocabularies for Generic Visual Categorization,
     by F. Perronnin, C. Dance, G. Csurka, M. Bressan, in Proceedings of the
     European Conference on Computer Vision (ECCV), 2006.  [pdf]
 
 
 - *Fast Discriminative Visual Codebooks using
     Randomized Clustering Forests, by A. Moosmann, B. Triggs and F.
     Jurie.  Neural Information
     Processing Systems (NIPS), 2006.  [pdf]
 
 
 - Object Categorization by Learned Universal
     Visual Dictionary.  J. Winn, A.
     Criminisi and T. Minka.   In
     Proceedings of the IEEE International Conference on Computer Vision
     (ICCV), 2005.   [pdf]
 
 
 - Vector Quantizing Feature Space with a Regular Lattice, by T.
     Tuytelaars and C. Schmid, in Proceedings of the IEEE International
     Conference on Computer Vision (ICCV), 2007.  [pdf]
 
 
 - *Scalable Recognition
     with a Vocabulary Tree, by D.
     Nister and H. Stewenius, in Proceedings of the IEEE
     Conference on Computer Vision and Pattern Recognition (CVPR), 2006.  [pdf]
 
 
 - Adaptive Vocabulary Forests for Dynamic Indexing
     and Category Learning, by T. Yeh, J. Lee, and T. Darrell.  In Proceedings of the IEEE International
     Conference on Computer Vision (ICCV), 2007.  [pdf]  [web]
 
 
 
Related links
 
            Executables
for interest operators and descriptors, from Oxford VGG
 
Benchmark database from
University of Kentucky, used in vocab tree., plus the semiprocessed data.
 
            Libpmk, library from John Lee
that includes hierarchical clustering / vocab
 
            Software from
LEAR team at INRIA, including interest point detectors, shape features,
randomized forest image classifier
            
 
 
 
Mining image collections
 
Mining large unstructured collections of images can
identify common visual patterns and allow the discovery of topics or even
categories.  These papers include methods
for clustering according to latent topics and repeated configurations of
features, mining for association rules, and playing with large image
collections.
 
 - Video Data
     Mining Using Configurations of Viewpoint Invariant Regions, by Sivic, J.
     and Zisserman, A. in Proceedings
     of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
     2004.  [pdf]
 
 
 - Efficient Mining of Frequent and Distinctive
     Feature Configurations, by T.
     Quack, V. Ferrari, B. Leibe, and L. Van Gool, In Proceedings of the IEEE
     International Conference on Computer Vision (ICCV), 2007.  [pdf]
 
 
 - Mining Association Rules Between Sets of Items in Large Databases,
     by R. Agrawal, T. Imielinski, and A. N. Swami.  In Special Interest Group on Management
     of Data (SIGMOD), 1993.   [pdf] 
 
 
 - Discovering Objects and Their Location in
     Images, by J. Sivic, B. Russell, A.
     Efros, A. Zisserman, and W. Freeman, In Proceedings of the IEEE
     International Conference on Computer Vision (ICCV), 2005.  [pdf]
     [web]
 
 
 
 - Mining Image Datasets using Perceptual Association Rules, by J.
     Tesic, S. Newsam, and B. S. Manjunath. 
     In SIAM’03 Workshop on Mining Scientific and
     Engineering Datasets, 2003.  [pdf] 
 
 
 
Related links
            
            pLSA
implementations
 
            Matlab
code and data for affinity propagation, from Dueck & Frey
 
Weka: Java data mining software,
includes implementatin of Apriori algorithm
 
 
 
Fast indexing methods
 
Content-based image and video retrieval, as well as example-based
recognition systems, require the ability to rapidly search very large image
collections.  This area deals with
algorithms for fast search, specifically in the context of indexing images or
image features.
 
 
 - Scalable Recognition
     with a Vocabulary Tree, by D.
     Nister and H. Stewenius, in Proceedings of the IEEE
     Conference on Computer Vision and Pattern Recognition (CVPR), 2006.  [pdf]
 
 
 - *A Binning Scheme for
     Fast Hard Drive Based Image Search, F.
     Fraundorfer, H. 
     Stewenius, and D. Nister, in Proceedings of the IEEE
     Conference on Computer Vision and Pattern Recognition (CVPR), 2007. 
     [pdf]
 
 
 - *Fast Pose Estimation with Parameter Sensitive
     Hashing, by G. Shakhnarovich, P. Viola, T. Darrell, In Proceedings of the
     IEEE International Conference on Computer Vision (ICCV), 2003.  [pdf]
 
 
 - Video Google: A Text Retrieval Approach
     to Object Matching in Videos, by J.
     Sivic and A. Zisserman, In Proceedings of the IEEE International
     Conference on Computer Vision (ICCV), 2003.  [pdf]  [web]
 
 
 - Fast Similarity Search for Learned Metrics.  P. Jain, B. Kulis, and K. Grauman.  UTCS Technical Report #TR-07-48,
     September, 2007.   
 
 
 - *Learning Embeddings
     for Fast Approximate Nearest Neighbor Retrieval.   V. Athitsos, J. Alon, S. Sclaroff,
     and G. Kollios, Nearest-Neighbor
     Methods in Learning and Vision: Theory and Practice, G. Shakhnarovich, T. Darrell and P. Indyk,
     Editors.  MIT Press, March
     2006.  [ps]
 
 
 
Related links
 
LSH homepage, email authors for code
package
 
LSH Matlab code by
Greg Shakhnarovich
 
            Nearest
neighbor datasets from Vassilis Athitsos
 
            Electronic
copy of the book Nearest Neighbor Methods
in Learning and Vision: Theory and Practice (UT EID required)
 
 
 
Faces
 
These papers consider the problems of detecting
faces, recognizing familiar faces, and looking for repeated faces in
videos.  A variety of techniques are
represented below.
 
 - Face Recognition: A Literature Survey, by W. Zhao, R. Chellappa, A.
     Rosenfeld, and P. Phillips.  In ACM
     Computing Surveys, 2003. [pdf]
 
 
 - *Rapid Object Detection Using a Boosted Cascade
     of Simple Features, by P. Viola and M. Jones, In Proceedings of the IEEE
     Conference on Computer Vision and Pattern Recognition (CVPR), 2001.  [pdf]
 
 
 
 
o       Active Appearance Models, by T.F.Cootes, G.J. Edwards
and C.J.Taylor. IEEE Transactions on
Pattern Analysis and Machine Intelligence (PAMI), Vol.23, No.6,
pp.681-685, 2001.
 
 - *Automatic Cast Listing in Feature-Length
     Films with Anisotropic Manifold Space, by Arandjelovic and R. Cipolla, In Proceedings of the IEEE Conference
     on Computer Vision and Pattern Recognition (CVPR), 2006.  [pdf]
 
 
 - Person Spotting: Video Shot Retrieval for Face Sets, J. Sivic, M.
     Everingham, and A. Zisserman. In International Conference on Image and
     Video Retrieval (CIVR), 2005.  [pdf]
     
 
 
 - Leveraging Archival Video for Building
     Face Datasets, D. Ramanan, S.
     Baker, S. Kakade.  In Proceedings of the IEEE International
     Conference on Computer Vision (ICCV), 2007.  [pdf]
 
 
 - Face Recognition by Humans: 19 Results All Computer Vision
     Researchers Should Know About, by P. Sinha, B. Balas, Y. Ostrovsky, and R.
     Russell,  Proceedings of the IEEE,
     Vol. 94, No. 11, November 2006, pp. 1948-1962. [pdf]
 
 
 
Related links
            
            Intel’s OpenCV
library, includes Viola & Jones face detector
            
            Active
Appearance Models code from Tim Cootes
 
            Data collections of
detected faces, from Oxford VGG
 
Face data from Buffy
episode, from Oxford VGG
 
University of Cambridge face data
from films [go to Data link]
 
PolarRose.com
 
Pittsburgh Pattern Recognition face detector
demo
 
 
 
Datasets and dataset creation
 
These papers discuss issues in generating image
datasets for recognition research. 
Benchmark image datasets allow direct comparisons between various
recognition algorithms, and having accessible prepared datasets can be critical
for the research itself.  The process of
designing an image collection is also important, since the degree of
variability can to some degree influence the assumptions made by new methods,
or may not adequately show-off their strengths. 
Meanwhile, the process of collecting labeled data is expensive and can
be tedious.  These papers include novel
ways to gather image collections with less pain, and highlight some of the
considerations to be made in database design. 
*Coverage of this area should include highlights on recent commonly used
datasets.*
 
 - Dataset Issues in Object Recognition. by J. Ponce, T.L. Berg, M.
     Everingham, D.A. Forsyth, M. Hebert, S. Lazebnik, M. Marszalek, C. Schmid,
     B.C. Russell, A. Torralba, C.K.I. Williams, J. Zhang, and A.
     Zisserman.  In J. Ponce et al. (Eds.):
     Toward Category-Level Object Recognition, LNCS 4170, pp. 29–48, 2006.  [pdf]
 
 
 
 
 - Soylent Grid: it’s Made of People! by S.
     Steinbach, V. Rabaud and S. Belongie,
     ICCV workshop on Interactive Computer Vision, 2007.  [pdf] 
 
 
 - Harvesting Image Databases from the Web, by F. Schroff, A.
     Criminisi, and A. Zisserman, Proceedings of the IEEE International
     Conference on Computer Vision (ICCV), 2007.  [pdf]
     
 
 
[No demo on this topic.]
 
Related links
 
            Dataset
list with links
 
 
Near-duplicate detection
 
This problem
involves detecting cases where multiple images (or videos) are the same except
for some slight alterations. 
Near-duplicate detection can be useful for detecting copyright
violations or forged images.  These
papers include several vision approaches, as well as some papers on the core algorithms
often used.
 
 
 - Efficient Near-Duplicate Detection and
     Subimage Retrieval, by Yan Ke,
     Rahul Sukthankar, and Larry Huston, ACM Multimedia 2004.  [pdf]
 
 
 - Enhancing DPF for Near-replica Image
     Recognition, by Y. Meng, E. Chang, and B.
     Li, Proceedings of the Conference on Computer Vision and Pattern
     Recognition (CVPR), 2003. [pdf]
 
 
 - Content-based Copy Detection using
     Distortion-Based Probabilistic Similarity Search, by A. Joly, O. Buisson,
     and C. Frélicot.  In IEEE
     Transactions on Multimedia, 2007.  [pdf]
 
 
 - Filtering Image Spam with Near-Duplicate
     Detection, by Zhe Wang, W. Josephson, Q. Lv, M. Charikar, and K. Li.  Proceedings
     of the 4th Conference on Email and Anti-Spam (CEAS), 2007. [pdf]
 
 
 - M. Henzinger. Finding Near-Duplicate Web Pages: a Large-Scale
     Evaluation of Algorithms. In ACM Special Interest Group on Information
     Retrieval (SIGIR), 2006.  (text
     application) [pdf]
 
 
 - On the Resemblance and Containment of Documents, Andrei Z. Broder,
     1997. [pdf]
 
 
 - Similarity Estimation Techniques from Rounding Algorithms, M. S.
     Charikar.  In 34th Annual
     ACMSymposium on Theory of Computing (May 2002).  [ps]
 
 
 - Scalable Near Identical Image and Shot
     Detection, by O. Chum, J. Philbin,
     M. Isard, and A. Zisserman, ACM
     International Conference on Image and Video Retrieval, 2007. [pdf]
 
 
 
Related links:
 
            Data
from Ke et al. paper
            
            LSH
homepage, email authors for code package
 
LSH Matlab code by
Greg Shakhnarovich
 
            TRECVID data
 
 
 
Learning
distance functions
 
The success
of any distance-based indexing, clustering, or classification scheme depends
critically on the quality of the chosen distance metric, and the extent to
which it accurately reflects the true underlying relationships between the
examples in a particular data domain. An optimal distance metric should report
small distances for examples that are similar in the parameter space of
interest (or that share a class label), and large distances for examples that
are unrelated.  These papers consider
distance learning specifically for image retrieval tasks.
 
 - Learning Distance
     Functions for Image Retrieval, by T.
     Hertz, A. Bar-Hillel and D. Weinshall, in Proceedings of the IEEE Conference on Computer Vision and
     Pattern Recognition (CVPR) 2004. 
     [pdf]
 
 
 - Learning a Mahalanobis Metric from Equivalence
     Constraints, by A.
     Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall, in Journal of Machine
     Learning Research (JMLR), 2005.  [pdf]
 
 
 - *Learning Globally-Consistent Local Distance
     Functions for Shape-Based Image Retrieval and Classification, by A. Frome,
     Y. Singer, F. Sha, J. Malik, in Proceedings of the IEEE International
     Conference on Computer Vision (ICCV), 2007.  [pdf]  [web]
 
 
 - *Invariant Large
     Margin Nearest Neighbor Classifier,
     by P. Mudigonda, P. Torr, and A. Zisserman, in Proceedings of the IEEE International Conference on
     Computer Vision (ICCV), 2007.  [pdf]
 
 
 - Fast Pose Estimation with Parameter Sensitive Hashing,
     by G. Shakhnarovich, P. Viola, and T. Darrell, in Proceedings of the IEEE
     International Conference on Computer Vision (ICCV), 2003.  [pdf] 
 
 
 
 
Related links:
 
            DistBoost code,
Hertz et al.
Relevant Components
Analysis code, Hertz et al.
 
            DistLearn
toolkit
            
            Large
Margin Nearest Neighbors code by Weinberger et al.
 
            Nearest
neighbor datasets from Vassilis Athitsos
 
 
Place recognition and kidnapped robots
 
How can an image of the current scene allow
localization or place recognition?  Or,
put more dramatically, how can a kidnapped robot that is carried off to an
arbitrary location figure out where it is with no prior knowledge of its
position?  These papers address this
problem, some specifically with a robotics slant, and some in terms of the image-based
scene matching problem.
 
 - *Vision-Based Global Localization and Mapping for Mobile
     Robots, Se, S., Lowe, D., & Little, J. 
     IEEE Transactions on Robotics, 2005.  [pdf]
 
 
 - Image-Based Localisation, by R. Cipolla, D. Robertson and B. Tordoff.  Proceedings
     of the10th International Conference on Virtual Systems and Multimedia,
     2004.  [pdf]
 
 
 - *Qualitative Image Based Localization in
     Indoors Environments, by J.
     Kosecka, L. Zhou, P. Barber, and Z. Duric, in Proceedings of the IEEE
     Conference on Computer Vision and Pattern Recognition (CVPR), 2003. [pdf]
 
 
 - Location Recognition and Global Localization Based on
     Scale-Invariant Keypoints, by J. Kosecka and X. Yang,  CVPR workshop 2004.  [pdf]
 
 
 - Searching the Web with
     Mobile Images for Location Recognition, T. Yeh, K. Tollmar, and T. Darrell, in
     Proceedings of the IEEE Conference on Computer Vision and Pattern
     Recognition (CVPR), 2004.  [pdf]
 
 
 - Total Recall:
     Automatic Query Expansion with a Generative Feature Model for Object
     Retrieval, by O. Chum, J. Philbin,
     J. Sivic, M. Isard, A. Zisserman, in Proceedings of the IEEE International
     Conference on Computer Vision (ICCV), 2007.  [pdf]
 
 
 
Related links:
            
            Oxford
buildings dataset
 
 
Text and speech + images and video
 
Often images or videos are accompanied by text or
speech, which may provide complementary cues when we are trying to index,
cluster, or recognize objects.  These
papers seek to leverage this cue in a number of different ways.
 
 - *“Hello! My name is... Buffy” – Automatic Naming of Characters in
     TV Video, by M. Everingham, J. Sivic and A. Zisserman, British Machine
     Vision Conference (BMVC), 2006.  [pdf]
 
 
 - *Object Recognition as Machine Translation: Learning a Lexicon for
     a Fixed Image Vocabulary, P. Duygulu, K. Barnard, N. de Freitas, and D.
     Forsyth, in Proceedings of the European Conference on Computer Vision
     (ECCV), 2002.  [pdf]  [web]
 
 
 - Names and Faces in the News, by T. Berg, A.
     Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth,
     In Proceedings of the IEEE Conference on Computer Vision and Pattern
     Recognition (CVPR), 2004.  [pdf]  [web]
 
 
 - Learning Structured Appearance Models
     from Captioned Images of Cluttered Scenes, by M. Jamieson A. Fazly, S.
      Dickinson, S. Stevenson, S.
     Wachsmuth.  In Proceedings of the IEEE International
     Conference on Computer Vision (ICCV), 2007. [pdf]
 
 
 - Clustering Web Images with Multi-modal
     Features, by M. Rege, M. Dong, and
     J. Hua, ACM Multimedia 2007.  [pdf]
     
 
 
 
 
Related links:
            
Face data from Buffy
episode, from Oxford Visual Geometry Group 
 
Data from Duygulu et
al. paper
 
Subrip for subtitle
extraction
            
 
Context and
background knowledge in recognition
 
Many recognition systems consider snapshots of
objects in isolation, both when training and testing.  But both our intuition and cognitive studies
indicate that the object’s greater context can also be crucial to the
recognition process.  These papers
consider how prior external knowledge can aid in recognizing objects or
categories.  The context cues may come
from reasoning explicitly about the 3d environment, knowing something about the
patterns of a user, learning about the typical patterns of occurrence, or
gleaning knowledge from an organized ontology. 
 
 - *Putting Objects in Perspective, by D. Hoiem,
     A.A. Efros, and M. Hebert, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
     2006.  [pdf] [web]
 
 
 - Objects in Context, by A. Rabinovich, A. Vedaldi, C. Galleguillos, E.
     Wiewiora, S. Belongie, in Proceedings of
     the IEEE International Conference on Computer Vision (ICCV), 2007. 
     [pdf]
 
 
 - Visual Contextual Awareness in Wearable Computing, by T. Starner,
     B. Schiele, and A. Pentland.  In Proceedings of Visual Contextual
     Awareness in Wearable Computing, 1998.  [pdf]  [web]
 
 
 - *Contextual Priming for Object Detection, by A. Torralba.  International
     Journal of Computer Vision, 2003. 
     [pdf]  [web] [web]
 
 
 - The Role of Context in Object Recognition, by A. Oliva and A.
     Torralba. TRENDS in Cognitive Sciences, Vol 11 No 12, 2007.  [pdf]  
 
 
 
 
 - Unsupervised Learning of Hierarchical Semantics
     of Objects, by D. Parikh and T. Chen, in Proceedings of the International
     Conference on Computer Vision (ICCV), 2007.  [pdf]
     [web]
 
 
 
Related links:
 
            WordNet
            
            Scene global
feature code from Antonio Torralba
 
            MIT
CSAIL database of objects and scenes
            
 
Learning about images from
keyword-based Web search
 
Keyword-based search on the Web can be used to
retrieve images (or videos) that appear near the query word, are named with the
word, or are explicitly tagged with it. 
Of course, this is not a completely reliable way to find images of a
given object or scene, and typically an image contains much more information
than can be conveyed in a few words anyhow. 
Yet search engines’ rapid access to large amounts of image/video content
make them an interesting resource for vision research.  These papers all consider ways to learn from
the images that come back from a keyword-based search, taking into account the
large amount of noise in the returns.
 
 - *Learning Color Names from Real-World Images, by J. van de Weijer,
     C. Schmid, J. Verbeek, in Proceedings of the IEEE International Conference
     on Computer Vision (ICCV), 2007.  [pdf]
 
 
 - Searching the Web with
     Mobile Images for Location Recognition, T. Yeh, K. Tollmar, and T. Darrell, in
     Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
     (CVPR), 2004.  [pdf]
 
 
 - *Learning Object Categories from Google’s Image Search, by R.
     Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, in Proceedings of the IEEE International
     Conference on Computer Vision (ICCV), 2005.  [pdf]  [web]
 
 
 - Animals on the Web, by T. Berg and D. Forsyth,
     in Proceedings of the IEEE Conference on Computer Vision and Pattern
     Recognition (CVPR),
     2006.  [pdf]
 
 
 - Keywords to Visual Categories: Multiple-Instance
     Learning for Weakly Supervised Object Categorization, by S.
      Vijayanarasimhan and K. Grauman, UTCS Tech report, 2007.  [pdf]
 
 
 - Harvesting Image Databases from the Web, by F. Schroff, A.
     Criminisi, and A. Zisserman, in Proceedings
     of the IEEE International Conference on Computer Vision (ICCV),
     2007.  [pdf]
 
 
 - Probabilistic Web Image Gathering, by K. Yanai and K. Barnard, in ACM Multimedia
     2005.  [pdf]
 
 
 
Related links:
            
            Animals on the
Web data from Berg et al.
 
            Annotated Google
image data from Schroff et al. paper
            
            Color name
datasets from van de Weijer et al. and
feature code
 
            Google image data from Fergus et al.
 
            Flickr Commons,
Library of Congress pilot project
 
            Semantic robot vision challenge  and  example
data
 
 
Video summarization
 
How can a video be compactly presented in a visual
way?  Video summarization methods attempt
to abstract the main occurrences, scenes, or objects in a clip in order to
provide an easily interpreted synopsis.
 
 
 
 - Video Abstraction, by J. Oh, Q. Wen, J. lee, and S. Hwang.  In S. Deb, editor, Video Data Mangement and Information
     Retrieval, Idea Group Inc. and IRM Press, 2004.  [pdf]
     
 
 
 - Shapetime
     Photography, by W. T Freeman
     and H. Zhang, in Proceedings IEEE Computer Vision and Pattern Recognition
     (CVPR), 2003.  [pdf]
 
 
 - Video Summaries through Mosaic-Based Shot and
     Scene Clustering, A. Aner and J. Kender, in Proceedings of the European
     Conference on Computer Vision (ECCV), 2002.  [pdf]
 
 
 - Dynamic Stills and Clip Trailers, by Y. Caspi,
     A. Axelrod, Y. Matsushita, A. Gamliel. 
     [pdf]
     [web]
 
 
 - Reliable Transition Detection in Videos:
     A Survey and Practitioner’s Guide, by R. Lienhart, International Journal of Image and Graphics,
     2001.  [pdf]
 
 
 - Recent Advances in Content-based Video
     Analysis.  C Ngo, H. Zhang, and T.
     Pong.  International Journal of
     Image and Graphics, 2001. [pdf]
 
 
 
Image and video retargeting
 
These papers cover both content-aware resizing as
well as texture synthesis.  The general
idea is to automate (or semi-automate) the process of adapting image or video
inputs to a desired format, whether that’s so it can be viewed well on a
different display size, or so it can be viewed continuously as if the regular
spatial or temporal pattern persists beyond where it ends in the raw
input.  The challenges include adapting
the input in such a way that the most interesting parts are preserved or
well-represented, and in the case of textures, generating processes that look
realistically stochastic and “natural”.
 
o       Image
Quilting for Texture Synthesis and Transfer, by A. Efros and W. Freeman, in ACM
Transactions on Graphics (SIGGRAPH), 2001. [pdf]
[web]
 
 - Fast Texture Synthesis using
     Tree-structured Vector Quantization, by L.Wei  and M. Levoy, in ACM Transactions
     on Graphics (SIGGRAPH), 2000.  [pdf]
     [web]
 
 
 
 
 - Automatic Thumbnail Cropping and its
     Effectiveness, by B. Suh, H. Ling,
     B. Bederson, and D. Jacobs.  In Proceedings
     of the Symposium On User interface Software and Technology, 2003.  [pdf]
 
 
 - *Non-homogeneous Content-driven
     Video-retargeting, by L. Wolf, M.
     Guttmann, D. Cohen-Or: In Proceedings of the IEEE International Conference
     on Computer Vision (ICCV), 2007.  [pdf] [web]
 
 
 - Video Retargeting: Automating Pan and
     Scan, by F. Liu and M. Gleicher, in
     ACM Multimedia, 2006.  [pdf]
 
 
 
 
Exploring images in 3D
 
From multiple views of a scene we can create 3D
representations or new renderings.  These
papers propose ways to explore image content in 3D, with an emphasis on
applications of doing so, such as perusing popular tourist sites from multiple
users’ photos, analyzing the geometry of paintings, or editing photos based on
their layers.  Some methods included here
are semi-automatic.
 
 
 
 - Tour into the Picture: Using a Spidery Mesh Interface to Make
     Animation from a Single Image, by Y. Horry, K. Anjyo, and K. Arai. ACM
     Transactions on Graphics (SIGGRAPH), 1997. 
     [pdf]
     
 
 
 - Automatic Photo-Popup, by D. Hoiem, A. Efros,
     and M. Hebert, ACM Transactions on Graphics (SIGGRAPH), 2005.  [pdf] [web]
 
 
 - *Single-View Metrology: Algorithms and
     Applications, by A.
     Criminisi, DAGM, 2002.  [pdf] [web]
 
 
 - Single
     View Metrology, A.
     Criminisi, I. Reid, A. Zisserman,
     International Journal of Computer Vision, 1999.  [pdf]
     
 
 
 - Image-Based Modeling and Photo Editing, by
     B. Oh, M. Chen, J. Dorsey, and F.
     Durand, ACM Transactions on Graphics (SIGGRAPH), 2001. [pdf]
 
 
 
Related links:
            
            Some PhotoTourism
patch data from Microsoft Research
 
 
Canonical
views and visualization
 
Given an object or scene, what sparse set of
viewpoints best summarize it?  This
problem is in some ways related to the video summarization topic (see above),
but here with an emphasis on the visualization of photo collections, and with
some consideration of optimizing for human perception.
 
 - Scene Summarization for Online Image Collections, by I. Simon, N. Snavely, and S. Seitz.  In Proceedings of the IEEE International
     Conference on Computer Vision (ICCV), 2007 [pdf]
     [web]
 
 
 - Generating Summaries for Large Collections of
     Geo-referenced Photographs, by A. Jaffe, M. Naaman, T. Tassa, and M.
     Davis. International Conference on World Wide Web, 2006. [pdf]
 
 
 - Approximation of Canonical
     Sets and Their Applications to 2D View Simplification, by T. Denton, J.
     Abrahamson, A. Shokoufandeh, in
     Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2004.  [pdf]
 
 
 - What Object Attributes Determine Canonical
     Views?  V. Blanz, M. Tarr, H.
     Bulthoff.  Perception, 28(5):575-600, 1999.  [pdf]
 
 
 - Digital Tapestry, by C. Rother, S. Kumar, V. Kolmogorov, and A.
     Blake, in Proceedings IEEE Computer Vision and
     Pattern Recognition (CVPR), 2005.  [pdf] [web]
 
 
 - Picture Collage, by J. Wang, J. Sun, L. Quan, X. Tang,  and H. Shum, in Proceedings IEEE
     Computer Vision and Pattern Recognition (CVPR), 2006.  [pdf]
 
 
 
 
Shape matching
 
The shape matching problem considers how to compare
shapes, often as defined in terms of their contours, silhouettes, or sampled
edge points.  These papers provide
different matching metrics and demonstrate the use of shape for applications
like object recognition, reading warped text, detecting pedestrians, and
categorization.
 
 - *Shape Matching and Object Recognition Using
     Shape Contexts, by S. Belongie, J. Malik, and J. Puzicha. Transactions on
     Pattern Analysis and Machine Intelligence (PAMI), 2002.  [pdf]
     [web]
 
 
 - Recognizing Objects in Adversarial Clutter: Breaking a Visual
     CAPTCHA, by G. Mori and J. Malik, in Proceedings IEEE Computer Vision and
     Pattern Recognition (CVPR), 2003.  [pdf] [web]
 
 
 - Using the
     Inner-Distance for Classification of Articulated Shapes, by H. Ling and D. Jacobs, Proceedings of the IEEE
     Conference on Computer Vision and Pattern Recognition (CVPR), 2005.  [pdf]
 
 
 - Comparing Images Using the Hausdorff Distance,
     by D. Huttenlocher, G. Klanderman, and W. Rucklidge, Transactions on
     Pattern Analysis and Machine Intelligence (PAMI), 1993.  [pdf]
 
 
 - Pedestrian Detection from a Moving Vehicle, by
     D. Gavrila, Proceedings of the European Conference on Computer Vision
     (ECCV), 2000.  [pdf]
 
 
 - *A Boundary-Fragment-Model for Object
     Detection, by A. Opelt, A. Pinz,
     and A. Zisserman, Proceedings of the European Conference on Computer
     Vision (ECCV), 2006.  [pdf]
 
 
 - Hierarchical Matching
     of Deformable Shapes, by P. Felzenszwalb
     and J. Schwartz, in Proceedings of the IEEE Conference on Computer Vision
     and Pattern Recognition, 2007.  [pdf]
 
 
 
Related links:
 
            Matlab
code for shape context features and matching
 
            MNIST
handwritten digits database
 
 
Detecting abnormal events
 
It would be useful if a vision system that monitors
video could automatically determine when something “unusual” is happening.  But how can a system be trained to recognize
something it has never (or rarely) seen before? 
These techniques address the problem of detecting visual anomalies.