CS395T: Visual Recognition and
Search
Spring 2008
Topics
Visual vocabularies
Mining image
collections
Fast indexing
methods
Faces
Datasets and
dataset creation
Near-duplicate
detection
Learning distance
functions
Place recognition
and kidnapped robots
Text/speech and
images/video
Context and
background knowledge in recognition
Learning about
images from keyword-based Web search
Video summarization
Image and video
retargeting
Exploring images in
3D
Canonical views and
visualization
Shape matching
Detecting
abnormal events
Visual vocabularies
Words are basic tokens in a document of text: they
allow us to index documents with a keyword search, or discover topics based on
common distributions of words. What is
the analogy for an image? Visual words are
prototypical local features that form a “vocabulary” to generate images. As with documents, they can be a useful
representation. Various recognition
approaches exploit a bag-of-visual-words feature space, identifying the
vocabulary words based on some quantization of a sample of local
descriptors. These papers address
questions surrounding vocabulary formation, including interest point selection,
quantization strategies, and maintaining efficient codebooks.
- *Sampling Strategies for Bag-of-Features Image
Classification. E. Nowak, F. Jurie,
and B. Triggs. In Proceedings of
the European Conference on Computer Vision (ECCV), 2006. [pdf]
- Visual Categorization with Bags of Keypoints, by
G. Csurka, C. Bray, C. Dance, and L. Fan.
In Workshop on Statistical Learning in Computer Vision, ECCV,
2004. [pdf]
- Adapted Vocabularies for Generic Visual Categorization,
by F. Perronnin, C. Dance, G. Csurka, M. Bressan, in Proceedings of the
European Conference on Computer Vision (ECCV), 2006. [pdf]
- *Fast Discriminative Visual Codebooks using
Randomized Clustering Forests, by A. Moosmann, B. Triggs and F.
Jurie. Neural Information
Processing Systems (NIPS), 2006. [pdf]
- Object Categorization by Learned Universal
Visual Dictionary. J. Winn, A.
Criminisi and T. Minka. In
Proceedings of the IEEE International Conference on Computer Vision
(ICCV), 2005. [pdf]
- Vector Quantizing Feature Space with a Regular Lattice, by T.
Tuytelaars and C. Schmid, in Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2007. [pdf]
- *Scalable Recognition
with a Vocabulary Tree, by D.
Nister and H. Stewenius, in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2006. [pdf]
- Adaptive Vocabulary Forests for Dynamic Indexing
and Category Learning, by T. Yeh, J. Lee, and T. Darrell. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2007. [pdf] [web]
Related links
Executables
for interest operators and descriptors, from Oxford VGG
Benchmark database from
University of Kentucky, used in vocab tree., plus the semiprocessed data.
Libpmk, library from John Lee
that includes hierarchical clustering / vocab
Software from
LEAR team at INRIA, including interest point detectors, shape features,
randomized forest image classifier
Mining image collections
Mining large unstructured collections of images can
identify common visual patterns and allow the discovery of topics or even
categories. These papers include methods
for clustering according to latent topics and repeated configurations of
features, mining for association rules, and playing with large image
collections.
- Video Data
Mining Using Configurations of Viewpoint Invariant Regions, by Sivic, J.
and Zisserman, A. in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2004. [pdf]
- Efficient Mining of Frequent and Distinctive
Feature Configurations, by T.
Quack, V. Ferrari, B. Leibe, and L. Van Gool, In Proceedings of the IEEE
International Conference on Computer Vision (ICCV), 2007. [pdf]
- Mining Association Rules Between Sets of Items in Large Databases,
by R. Agrawal, T. Imielinski, and A. N. Swami. In Special Interest Group on Management
of Data (SIGMOD), 1993. [pdf]
- Discovering Objects and Their Location in
Images, by J. Sivic, B. Russell, A.
Efros, A. Zisserman, and W. Freeman, In Proceedings of the IEEE
International Conference on Computer Vision (ICCV), 2005. [pdf]
[web]
- Mining Image Datasets using Perceptual Association Rules, by J.
Tesic, S. Newsam, and B. S. Manjunath.
In SIAM’03 Workshop on Mining Scientific and
Engineering Datasets, 2003. [pdf]
Related links
pLSA
implementations
Matlab
code and data for affinity propagation, from Dueck & Frey
Weka: Java data mining software,
includes implementatin of Apriori algorithm
Fast indexing methods
Content-based image and video retrieval, as well as example-based
recognition systems, require the ability to rapidly search very large image
collections. This area deals with
algorithms for fast search, specifically in the context of indexing images or
image features.
- Scalable Recognition
with a Vocabulary Tree, by D.
Nister and H. Stewenius, in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2006. [pdf]
- *A Binning Scheme for
Fast Hard Drive Based Image Search, F.
Fraundorfer, H.
Stewenius, and D. Nister, in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2007.
[pdf]
- *Fast Pose Estimation with Parameter Sensitive
Hashing, by G. Shakhnarovich, P. Viola, T. Darrell, In Proceedings of the
IEEE International Conference on Computer Vision (ICCV), 2003. [pdf]
- Video Google: A Text Retrieval Approach
to Object Matching in Videos, by J.
Sivic and A. Zisserman, In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2003. [pdf] [web]
- Fast Similarity Search for Learned Metrics. P. Jain, B. Kulis, and K. Grauman. UTCS Technical Report #TR-07-48,
September, 2007.
- *Learning Embeddings
for Fast Approximate Nearest Neighbor Retrieval. V. Athitsos, J. Alon, S. Sclaroff,
and G. Kollios, Nearest-Neighbor
Methods in Learning and Vision: Theory and Practice, G. Shakhnarovich, T. Darrell and P. Indyk,
Editors. MIT Press, March
2006. [ps]
Related links
LSH homepage, email authors for code
package
LSH Matlab code by
Greg Shakhnarovich
Nearest
neighbor datasets from Vassilis Athitsos
Electronic
copy of the book Nearest Neighbor Methods
in Learning and Vision: Theory and Practice (UT EID required)
Faces
These papers consider the problems of detecting
faces, recognizing familiar faces, and looking for repeated faces in
videos. A variety of techniques are
represented below.
- Face Recognition: A Literature Survey, by W. Zhao, R. Chellappa, A.
Rosenfeld, and P. Phillips. In ACM
Computing Surveys, 2003. [pdf]
- *Rapid Object Detection Using a Boosted Cascade
of Simple Features, by P. Viola and M. Jones, In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2001. [pdf]
o Active Appearance Models, by T.F.Cootes, G.J. Edwards
and C.J.Taylor. IEEE Transactions on
Pattern Analysis and Machine Intelligence (PAMI), Vol.23, No.6,
pp.681-685, 2001.
- *Automatic Cast Listing in Feature-Length
Films with Anisotropic Manifold Space, by Arandjelovic and R. Cipolla, In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2006. [pdf]
- Person Spotting: Video Shot Retrieval for Face Sets, J. Sivic, M.
Everingham, and A. Zisserman. In International Conference on Image and
Video Retrieval (CIVR), 2005. [pdf]
- Leveraging Archival Video for Building
Face Datasets, D. Ramanan, S.
Baker, S. Kakade. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2007. [pdf]
- Face Recognition by Humans: 19 Results All Computer Vision
Researchers Should Know About, by P. Sinha, B. Balas, Y. Ostrovsky, and R.
Russell, Proceedings of the IEEE,
Vol. 94, No. 11, November 2006, pp. 1948-1962. [pdf]
Related links
Intel’s OpenCV
library, includes Viola & Jones face detector
Active
Appearance Models code from Tim Cootes
Data collections of
detected faces, from Oxford VGG
Face data from Buffy
episode, from Oxford VGG
University of Cambridge face data
from films [go to Data link]
PolarRose.com
Pittsburgh Pattern Recognition face detector
demo
Datasets and dataset creation
These papers discuss issues in generating image
datasets for recognition research.
Benchmark image datasets allow direct comparisons between various
recognition algorithms, and having accessible prepared datasets can be critical
for the research itself. The process of
designing an image collection is also important, since the degree of
variability can to some degree influence the assumptions made by new methods,
or may not adequately show-off their strengths.
Meanwhile, the process of collecting labeled data is expensive and can
be tedious. These papers include novel
ways to gather image collections with less pain, and highlight some of the
considerations to be made in database design.
*Coverage of this area should include highlights on recent commonly used
datasets.*
- Dataset Issues in Object Recognition. by J. Ponce, T.L. Berg, M.
Everingham, D.A. Forsyth, M. Hebert, S. Lazebnik, M. Marszalek, C. Schmid,
B.C. Russell, A. Torralba, C.K.I. Williams, J. Zhang, and A.
Zisserman. In J. Ponce et al. (Eds.):
Toward Category-Level Object Recognition, LNCS 4170, pp. 29–48, 2006. [pdf]
- Soylent Grid: it’s Made of People! by S.
Steinbach, V. Rabaud and S. Belongie,
ICCV workshop on Interactive Computer Vision, 2007. [pdf]
- Harvesting Image Databases from the Web, by F. Schroff, A.
Criminisi, and A. Zisserman, Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2007. [pdf]
[No demo on this topic.]
Related links
Dataset
list with links
Near-duplicate detection
This problem
involves detecting cases where multiple images (or videos) are the same except
for some slight alterations.
Near-duplicate detection can be useful for detecting copyright
violations or forged images. These
papers include several vision approaches, as well as some papers on the core algorithms
often used.
- Efficient Near-Duplicate Detection and
Subimage Retrieval, by Yan Ke,
Rahul Sukthankar, and Larry Huston, ACM Multimedia 2004. [pdf]
- Enhancing DPF for Near-replica Image
Recognition, by Y. Meng, E. Chang, and B.
Li, Proceedings of the Conference on Computer Vision and Pattern
Recognition (CVPR), 2003. [pdf]
- Content-based Copy Detection using
Distortion-Based Probabilistic Similarity Search, by A. Joly, O. Buisson,
and C. Frélicot. In IEEE
Transactions on Multimedia, 2007. [pdf]
- Filtering Image Spam with Near-Duplicate
Detection, by Zhe Wang, W. Josephson, Q. Lv, M. Charikar, and K. Li. Proceedings
of the 4th Conference on Email and Anti-Spam (CEAS), 2007. [pdf]
- M. Henzinger. Finding Near-Duplicate Web Pages: a Large-Scale
Evaluation of Algorithms. In ACM Special Interest Group on Information
Retrieval (SIGIR), 2006. (text
application) [pdf]
- On the Resemblance and Containment of Documents, Andrei Z. Broder,
1997. [pdf]
- Similarity Estimation Techniques from Rounding Algorithms, M. S.
Charikar. In 34th Annual
ACMSymposium on Theory of Computing (May 2002). [ps]
- Scalable Near Identical Image and Shot
Detection, by O. Chum, J. Philbin,
M. Isard, and A. Zisserman, ACM
International Conference on Image and Video Retrieval, 2007. [pdf]
Related links:
Data
from Ke et al. paper
LSH
homepage, email authors for code package
LSH Matlab code by
Greg Shakhnarovich
TRECVID data
Learning
distance functions
The success
of any distance-based indexing, clustering, or classification scheme depends
critically on the quality of the chosen distance metric, and the extent to
which it accurately reflects the true underlying relationships between the
examples in a particular data domain. An optimal distance metric should report
small distances for examples that are similar in the parameter space of
interest (or that share a class label), and large distances for examples that
are unrelated. These papers consider
distance learning specifically for image retrieval tasks.
- Learning Distance
Functions for Image Retrieval, by T.
Hertz, A. Bar-Hillel and D. Weinshall, in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) 2004.
[pdf]
- Learning a Mahalanobis Metric from Equivalence
Constraints, by A.
Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall, in Journal of Machine
Learning Research (JMLR), 2005. [pdf]
- *Learning Globally-Consistent Local Distance
Functions for Shape-Based Image Retrieval and Classification, by A. Frome,
Y. Singer, F. Sha, J. Malik, in Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2007. [pdf] [web]
- *Invariant Large
Margin Nearest Neighbor Classifier,
by P. Mudigonda, P. Torr, and A. Zisserman, in Proceedings of the IEEE International Conference on
Computer Vision (ICCV), 2007. [pdf]
- Fast Pose Estimation with Parameter Sensitive Hashing,
by G. Shakhnarovich, P. Viola, and T. Darrell, in Proceedings of the IEEE
International Conference on Computer Vision (ICCV), 2003. [pdf]
Related links:
DistBoost code,
Hertz et al.
Relevant Components
Analysis code, Hertz et al.
DistLearn
toolkit
Large
Margin Nearest Neighbors code by Weinberger et al.
Nearest
neighbor datasets from Vassilis Athitsos
Place recognition and kidnapped robots
How can an image of the current scene allow
localization or place recognition? Or,
put more dramatically, how can a kidnapped robot that is carried off to an
arbitrary location figure out where it is with no prior knowledge of its
position? These papers address this
problem, some specifically with a robotics slant, and some in terms of the image-based
scene matching problem.
- *Vision-Based Global Localization and Mapping for Mobile
Robots, Se, S., Lowe, D., & Little, J.
IEEE Transactions on Robotics, 2005. [pdf]
- Image-Based Localisation, by R. Cipolla, D. Robertson and B. Tordoff. Proceedings
of the10th International Conference on Virtual Systems and Multimedia,
2004. [pdf]
- *Qualitative Image Based Localization in
Indoors Environments, by J.
Kosecka, L. Zhou, P. Barber, and Z. Duric, in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2003. [pdf]
- Location Recognition and Global Localization Based on
Scale-Invariant Keypoints, by J. Kosecka and X. Yang, CVPR workshop 2004. [pdf]
- Searching the Web with
Mobile Images for Location Recognition, T. Yeh, K. Tollmar, and T. Darrell, in
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2004. [pdf]
- Total Recall:
Automatic Query Expansion with a Generative Feature Model for Object
Retrieval, by O. Chum, J. Philbin,
J. Sivic, M. Isard, A. Zisserman, in Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2007. [pdf]
Related links:
Oxford
buildings dataset
Text and speech + images and video
Often images or videos are accompanied by text or
speech, which may provide complementary cues when we are trying to index,
cluster, or recognize objects. These
papers seek to leverage this cue in a number of different ways.
- *“Hello! My name is... Buffy” – Automatic Naming of Characters in
TV Video, by M. Everingham, J. Sivic and A. Zisserman, British Machine
Vision Conference (BMVC), 2006. [pdf]
- *Object Recognition as Machine Translation: Learning a Lexicon for
a Fixed Image Vocabulary, P. Duygulu, K. Barnard, N. de Freitas, and D.
Forsyth, in Proceedings of the European Conference on Computer Vision
(ECCV), 2002. [pdf] [web]
- Names and Faces in the News, by T. Berg, A.
Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth,
In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2004. [pdf] [web]
- Learning Structured Appearance Models
from Captioned Images of Cluttered Scenes, by M. Jamieson A. Fazly, S.
Dickinson, S. Stevenson, S.
Wachsmuth. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2007. [pdf]
- Clustering Web Images with Multi-modal
Features, by M. Rege, M. Dong, and
J. Hua, ACM Multimedia 2007. [pdf]
Related links:
Face data from Buffy
episode, from Oxford Visual Geometry Group
Data from Duygulu et
al. paper
Subrip for subtitle
extraction
Context and
background knowledge in recognition
Many recognition systems consider snapshots of
objects in isolation, both when training and testing. But both our intuition and cognitive studies
indicate that the object’s greater context can also be crucial to the
recognition process. These papers
consider how prior external knowledge can aid in recognizing objects or
categories. The context cues may come
from reasoning explicitly about the 3d environment, knowing something about the
patterns of a user, learning about the typical patterns of occurrence, or
gleaning knowledge from an organized ontology.
- *Putting Objects in Perspective, by D. Hoiem,
A.A. Efros, and M. Hebert, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2006. [pdf] [web]
- Objects in Context, by A. Rabinovich, A. Vedaldi, C. Galleguillos, E.
Wiewiora, S. Belongie, in Proceedings of
the IEEE International Conference on Computer Vision (ICCV), 2007.
[pdf]
- Visual Contextual Awareness in Wearable Computing, by T. Starner,
B. Schiele, and A. Pentland. In Proceedings of Visual Contextual
Awareness in Wearable Computing, 1998. [pdf] [web]
- *Contextual Priming for Object Detection, by A. Torralba. International
Journal of Computer Vision, 2003.
[pdf] [web] [web]
- The Role of Context in Object Recognition, by A. Oliva and A.
Torralba. TRENDS in Cognitive Sciences, Vol 11 No 12, 2007. [pdf]
- Unsupervised Learning of Hierarchical Semantics
of Objects, by D. Parikh and T. Chen, in Proceedings of the International
Conference on Computer Vision (ICCV), 2007. [pdf]
[web]
Related links:
WordNet
Scene global
feature code from Antonio Torralba
MIT
CSAIL database of objects and scenes
Learning about images from
keyword-based Web search
Keyword-based search on the Web can be used to
retrieve images (or videos) that appear near the query word, are named with the
word, or are explicitly tagged with it.
Of course, this is not a completely reliable way to find images of a
given object or scene, and typically an image contains much more information
than can be conveyed in a few words anyhow.
Yet search engines’ rapid access to large amounts of image/video content
make them an interesting resource for vision research. These papers all consider ways to learn from
the images that come back from a keyword-based search, taking into account the
large amount of noise in the returns.
- *Learning Color Names from Real-World Images, by J. van de Weijer,
C. Schmid, J. Verbeek, in Proceedings of the IEEE International Conference
on Computer Vision (ICCV), 2007. [pdf]
- Searching the Web with
Mobile Images for Location Recognition, T. Yeh, K. Tollmar, and T. Darrell, in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2004. [pdf]
- *Learning Object Categories from Google’s Image Search, by R.
Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, in Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2005. [pdf] [web]
- Animals on the Web, by T. Berg and D. Forsyth,
in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR),
2006. [pdf]
- Keywords to Visual Categories: Multiple-Instance
Learning for Weakly Supervised Object Categorization, by S.
Vijayanarasimhan and K. Grauman, UTCS Tech report, 2007. [pdf]
- Harvesting Image Databases from the Web, by F. Schroff, A.
Criminisi, and A. Zisserman, in Proceedings
of the IEEE International Conference on Computer Vision (ICCV),
2007. [pdf]
- Probabilistic Web Image Gathering, by K. Yanai and K. Barnard, in ACM Multimedia
2005. [pdf]
Related links:
Animals on the
Web data from Berg et al.
Annotated Google
image data from Schroff et al. paper
Color name
datasets from van de Weijer et al. and
feature code
Google image data from Fergus et al.
Flickr Commons,
Library of Congress pilot project
Semantic robot vision challenge and example
data
Video summarization
How can a video be compactly presented in a visual
way? Video summarization methods attempt
to abstract the main occurrences, scenes, or objects in a clip in order to
provide an easily interpreted synopsis.
- Video Abstraction, by J. Oh, Q. Wen, J. lee, and S. Hwang. In S. Deb, editor, Video Data Mangement and Information
Retrieval, Idea Group Inc. and IRM Press, 2004. [pdf]
- Shapetime
Photography, by W. T Freeman
and H. Zhang, in Proceedings IEEE Computer Vision and Pattern Recognition
(CVPR), 2003. [pdf]
- Video Summaries through Mosaic-Based Shot and
Scene Clustering, A. Aner and J. Kender, in Proceedings of the European
Conference on Computer Vision (ECCV), 2002. [pdf]
- Dynamic Stills and Clip Trailers, by Y. Caspi,
A. Axelrod, Y. Matsushita, A. Gamliel.
[pdf]
[web]
- Reliable Transition Detection in Videos:
A Survey and Practitioner’s Guide, by R. Lienhart, International Journal of Image and Graphics,
2001. [pdf]
- Recent Advances in Content-based Video
Analysis. C Ngo, H. Zhang, and T.
Pong. International Journal of
Image and Graphics, 2001. [pdf]
Image and video retargeting
These papers cover both content-aware resizing as
well as texture synthesis. The general
idea is to automate (or semi-automate) the process of adapting image or video
inputs to a desired format, whether that’s so it can be viewed well on a
different display size, or so it can be viewed continuously as if the regular
spatial or temporal pattern persists beyond where it ends in the raw
input. The challenges include adapting
the input in such a way that the most interesting parts are preserved or
well-represented, and in the case of textures, generating processes that look
realistically stochastic and “natural”.
o Image
Quilting for Texture Synthesis and Transfer, by A. Efros and W. Freeman, in ACM
Transactions on Graphics (SIGGRAPH), 2001. [pdf]
[web]
- Fast Texture Synthesis using
Tree-structured Vector Quantization, by L.Wei and M. Levoy, in ACM Transactions
on Graphics (SIGGRAPH), 2000. [pdf]
[web]
- Automatic Thumbnail Cropping and its
Effectiveness, by B. Suh, H. Ling,
B. Bederson, and D. Jacobs. In Proceedings
of the Symposium On User interface Software and Technology, 2003. [pdf]
- *Non-homogeneous Content-driven
Video-retargeting, by L. Wolf, M.
Guttmann, D. Cohen-Or: In Proceedings of the IEEE International Conference
on Computer Vision (ICCV), 2007. [pdf] [web]
- Video Retargeting: Automating Pan and
Scan, by F. Liu and M. Gleicher, in
ACM Multimedia, 2006. [pdf]
Exploring images in 3D
From multiple views of a scene we can create 3D
representations or new renderings. These
papers propose ways to explore image content in 3D, with an emphasis on
applications of doing so, such as perusing popular tourist sites from multiple
users’ photos, analyzing the geometry of paintings, or editing photos based on
their layers. Some methods included here
are semi-automatic.
- Tour into the Picture: Using a Spidery Mesh Interface to Make
Animation from a Single Image, by Y. Horry, K. Anjyo, and K. Arai. ACM
Transactions on Graphics (SIGGRAPH), 1997.
[pdf]
- Automatic Photo-Popup, by D. Hoiem, A. Efros,
and M. Hebert, ACM Transactions on Graphics (SIGGRAPH), 2005. [pdf] [web]
- *Single-View Metrology: Algorithms and
Applications, by A.
Criminisi, DAGM, 2002. [pdf] [web]
- Single
View Metrology, A.
Criminisi, I. Reid, A. Zisserman,
International Journal of Computer Vision, 1999. [pdf]
- Image-Based Modeling and Photo Editing, by
B. Oh, M. Chen, J. Dorsey, and F.
Durand, ACM Transactions on Graphics (SIGGRAPH), 2001. [pdf]
Related links:
Some PhotoTourism
patch data from Microsoft Research
Canonical
views and visualization
Given an object or scene, what sparse set of
viewpoints best summarize it? This
problem is in some ways related to the video summarization topic (see above),
but here with an emphasis on the visualization of photo collections, and with
some consideration of optimizing for human perception.
- Scene Summarization for Online Image Collections, by I. Simon, N. Snavely, and S. Seitz. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2007 [pdf]
[web]
- Generating Summaries for Large Collections of
Geo-referenced Photographs, by A. Jaffe, M. Naaman, T. Tassa, and M.
Davis. International Conference on World Wide Web, 2006. [pdf]
- Approximation of Canonical
Sets and Their Applications to 2D View Simplification, by T. Denton, J.
Abrahamson, A. Shokoufandeh, in
Proceedings IEEE Computer Vision and Pattern Recognition (CVPR), 2004. [pdf]
- What Object Attributes Determine Canonical
Views? V. Blanz, M. Tarr, H.
Bulthoff. Perception, 28(5):575-600, 1999. [pdf]
- Digital Tapestry, by C. Rother, S. Kumar, V. Kolmogorov, and A.
Blake, in Proceedings IEEE Computer Vision and
Pattern Recognition (CVPR), 2005. [pdf] [web]
- Picture Collage, by J. Wang, J. Sun, L. Quan, X. Tang, and H. Shum, in Proceedings IEEE
Computer Vision and Pattern Recognition (CVPR), 2006. [pdf]
Shape matching
The shape matching problem considers how to compare
shapes, often as defined in terms of their contours, silhouettes, or sampled
edge points. These papers provide
different matching metrics and demonstrate the use of shape for applications
like object recognition, reading warped text, detecting pedestrians, and
categorization.
- *Shape Matching and Object Recognition Using
Shape Contexts, by S. Belongie, J. Malik, and J. Puzicha. Transactions on
Pattern Analysis and Machine Intelligence (PAMI), 2002. [pdf]
[web]
- Recognizing Objects in Adversarial Clutter: Breaking a Visual
CAPTCHA, by G. Mori and J. Malik, in Proceedings IEEE Computer Vision and
Pattern Recognition (CVPR), 2003. [pdf] [web]
- Using the
Inner-Distance for Classification of Articulated Shapes, by H. Ling and D. Jacobs, Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2005. [pdf]
- Comparing Images Using the Hausdorff Distance,
by D. Huttenlocher, G. Klanderman, and W. Rucklidge, Transactions on
Pattern Analysis and Machine Intelligence (PAMI), 1993. [pdf]
- Pedestrian Detection from a Moving Vehicle, by
D. Gavrila, Proceedings of the European Conference on Computer Vision
(ECCV), 2000. [pdf]
- *A Boundary-Fragment-Model for Object
Detection, by A. Opelt, A. Pinz,
and A. Zisserman, Proceedings of the European Conference on Computer
Vision (ECCV), 2006. [pdf]
- Hierarchical Matching
of Deformable Shapes, by P. Felzenszwalb
and J. Schwartz, in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2007. [pdf]
Related links:
Matlab
code for shape context features and matching
MNIST
handwritten digits database
Detecting abnormal events
It would be useful if a vision system that monitors
video could automatically determine when something “unusual” is happening. But how can a system be trained to recognize
something it has never (or rarely) seen before?
These techniques address the problem of detecting visual anomalies.