![]() |
Figure 1: Overview of locality-sensitive hashing (LSH [Indyk & Motwani 1999, Charikar 2002]). A list of k hash functions are applied to map N database examples into a hash table where similar items are likely to share a bucket. After hashing a query Q, one must only evaluate the similarity between Q and the database examples with which it collides to obtain the approximate nearest-neighbors. |
![]() |
Figure 2: (b) When learning a metric, some paired constraints can be obtained for a portion of the image database, specifying some examples that ought to be treated as similar (straight line) or dissimilar (crossed out line). (c) Whereas existing randomized LSH functions hash examples similar under the original distance together, (d) our semi-supervised hash functions incorporate the learned constraints, so that examples constrained to be similar—or other pairs like them—will with high probability hash together. The circular red region in (c) denotes that the existing LSH functions generate a hyperplane uniformly at random to separate images, in contrast, as indicated by the blue “hourglass” region in (d), our hash functions bias the selection of random hyperplane to reflect the specified (dis)similarity constraints. In this example, even though the measured angle between x and y is wide, our semi-supervised hash functions are unlikely to split them into different buckets since the constraints indicate they should be treated as similar. |
![]() |
![]() |
Figure 3:
Result on the PhotoTourism patch dataset. We generate similarity
constraints based on true correspondences in the training data.
Given a new image, we want to search for similar patches. The
plot shows that the learned metric provides better retrieval of similar
patches, and that our semi-supervised hash functions make it possible
to search only a fraction of the entire pool to find the best matches. |
![]() |
Figure 4: Result on the Flickr Scenes dataset.
The goal is to take a query image and retrieve images of the same
tourist location. We generate similarity constraints based on
some labeled examples (same-class pairs form similarity constraints,
different-class pairs form dissimilarity constraints). The
learned proximity distribution kernel gives lower error (compare pink
curve to dotted black curve), and our semi-supervised hash functions
make it possible to search with that more accurate measure much more
efficiently (blue curve). For epsilon = 1.2, we are searching
only 3% of all the database images, and achieving an error rate very
close to linear scan's. |
![]() |
Figure
4: If the hash table is constructed to
preserve the similarity according to a certain learned metric (A_t),
then once new similarity
constraints require updates to the metric
(A_{t+1}), the original hash table becomes obsolete. For
example, in this schematic, the points are originally arranged into the
buckets on the left. Then some new constraint comes in that says
"blue" and "yellow" points should be treated as similar. Note
that in the old hash table, blues and yellows did not share a
bucket.
Rather than simply rehash, we show how to dynamically update the hash
keys so that the revised metric is correctly preserved (with high
probability). |