Near Neighbor Searches

**Title:**"Nearest Neighbor Search in Vector Quantization"**Authors:**- Kevin Zatloukal (kevinz@cs.washington.edu)
- Mary Holland Johnson (mhjohns@ee.washington.edu)
- Richard Ladner (ladner@cs.washington.edu)
**Organization:**University of Washington**Proposal Summary:****Data Set:****Software Contributions:**

**Title:**"Analysis of Approximate Nearest Neighbor Searching with Clustered Point Sets"**Authors:**- Songrit Maneewongvatana, songrit@cs.umd.edu
- David Mount, mount@cs.umd.edu
**Organization:**University of Maryland, College Park**Proposal Summary:****Data Set:****Software Contributions:**

**Title:**"Analysis and Improvement of the SR-tree: an Index Structure for Nearest Neighbor Searches"**Authors:**- Norio Katayama (katayama@rd.nacsis.ac.jp)
- Shin'ichi Satoh (satoh@rd.nacsis.ac.jp)
**Organization:**National Center for Science Information Systems, Japan**Proposal Summary:**Experimental evaluation of the Sphere/Rectangle-Tree (SR-Tree) for high-dimensional nearest neighbor queries. This is a secondary-memory data structure, corresponding to the nested hierarchy of the space decomposition while permitting overlap. For more details, see article in Proc. 1997 ACM SIGMOD, pp 369-380, or see their website at www.rd.nacsis.ac.jp/~katayama/homepage/research/srtree/English.html , which includes an online demonstration.**Data Set:**40700 images were collected from the photo and the video archives of NASA. As for videos, cuts are detected based on the transition of the color histogram and then representative images are extracted. The similarity of images is measured in terms of the color histogram. Munsell color space (hue, saturation, intensity) is used. It is divided into nine subspaces: black, gray white and six colors. Histograms of four sub-regions, i.e., upper-left, upper-right, lower-left and lower-right, are calculated for an image in order to take account of the composition of the image. Four histograms are concatenated to compose a 36-dimensional feature vector. Then, feature vectors are reduced to 20-dimensional vectors with the principal component analysis. Similarity of images is measured by Euclidean distance among these 20-dimensional vectors. As an experiment, 37000 of these vectors are randomly selected to be in the original data set, with the remaining 3700 used as query points.**Software Contributions:**- SR-Tree algorithm
**Data Set Contributions:**- Image vectors (Euclidean)

**Title:**"Experimentation with an Approximate Nearest Neighbours Search Algorithm based on the Extended General Spacefilling Curves Heuristic"**Authors:**- Juan-Carlos Pérez (jcperez@disca.upv.es)
- Enrique Vidal (evidal@iti.upv.es)
**Organization:**Universidad Plitécnica de Valencia**Proposal Summary:****Data Set:****Software Contributions:**

**Title:**"Nearest Neighbor Searching in Metric Spaces: Some Experimental Results"**Authors:**- Kenneth L. Clarkson (clarkson@research.bell-labs.com)
**Organization:**Bell Laboratories, Lucent Technologies**Proposal Summary:****Data Set:****Software Contributions:**

**Title:**"Excluded Middle Vantage Point Forests for Nearest Neighbor Search"**Authors:**- Peter N. Yianilos (pny@research.nj.nec.com)
**Organization:**NEC Research Institute**Proposal Summary:****Data Set:****Software Contributions:**

**Title:**"Evaluating Some Fast Nearest Neighbor Search Algorithms in Metric Spaces"**Authors:**- Luisa Micó (mico@dlsi.ua.es)
- Jose Oncina (oncina@dlsi.ua.es)
**Organization:**Universidad de Alicante**Proposal Summary:**Evaluation of various AESA-based algorithms (in particular, LAESA and TLAESA). AESA-based algorithms find the exact nearest neighbor in arbitrary metric spaces. The general methodology invovles choosing a representative sample of "base prototypes" from the original set. Preprocessing of the data set is done based on this sample. At query time, branch and bound techniques are used for finding the nearest neighbor. Variants in the algorithms depend on the exact data structures used for representing the preprocessed information, and in the manner in which the set of base prototypes is chosen.**Data Set:****Software Contributions:**

Sixth DIMACS Implementation Challenge.

Previous DIMACS Implementation Challenges.

Maintained by Michael Goldwasser

challenge6@dimacs.rutgers.edu

Document last modified on November 6, 1998.