Refinement-based graph construction for search in low-memory systems
In modern information retrieval tasks such as recommender systems, retrieval-augmented generation, and question-answering, there is a need for fast and efficient retrieval over large collections of vectors. These high-dimensional vector embeddings, typically produced by deep neural networks, pose computational and memory efficiency challenges for existing indexing methods, necessitating the development of new approaches. This submission to the SISAP 2025 Indexing Challenge tackles both of its challenge tracks: (1) approximate kNN search on a resource-limited system and (2) kNN graph construction. This solution to the first task pairs a graph-based index with scalar quantization to achieve 4x data compression while retaining query neighbors with 99% accuracy. For the second task, this work introduces an iterative, refinement-based construction of the kNN graph without the upfront computational or memory cost of a separate index. This submission achieved second and third place in Tasks 1 and 2, respectively, demonstrating the efficacy of graph-based indexes in memory-constrained systems.
October 1, 2025
Top-Down Construction of Locally Monotonic Graphs for Similarity Search
Similarity search is a fundamental task in applications such as recommender systems, image retrieval, and text retrieval. Graph-based indexes for similarity search traverse a graph constructed on the dataset to retrieve the query's neighbors, using edges to navigate to and explore the query's local neighborhood. Edge selection techniques are crucial for the performance of graph-based indexes, enhancing accuracy and efficiency by preventing local minima, reducing graph diameter, and improving sparsity. The Half-Space Proximal (HSP) Graph is an edge-minimal monotonic graph defined by a geometric edge selection which ensures a diverse, yet sparse set of edges. Unfortunately, the quadratic construction complexity of the HSP Graph renders it impractical for large-scale search scenarios. This work investigates an approximation of the HSP Graph that aims to preserve the monotonic property locally. By leveraging a hierarchical partitioning of the dataset, this work proposes a top-down, distributed graph construction which uses a coarse-scale graph on pivots to facilitate the construction of the layer below. This paper investigates the effectiveness of this approach as a submission to the SISAP 2024 Indexing Challenge.
November 4, 2024
Finding HSP Neighbors via an Exact, Hierarchical Approach
The Half Space Proximal (HSP) graph is a low out-degree monotonic graph with wide applications in various domains, including combinatorial optimization in strings, enhancing kNN classification, simplifying chemical networks, estimating local intrinsic dimensionality, and generating uniform samples from skewed distributions, among others. However, the linear complexity of finding HSP neighbors of a query limit its scalability, except when sacrificing accuracy by restricting the test to a small local neighborhood estimated through approximate indexing. This compromise leads to the loss of crucial long-range connections, introducing false positives and excluding false negatives, compromising the essential properties of the HSP. To overcome these limitations, we propose a fast and exact HSP Test showing sublinear complexity in extensive experimentation. Our hierarchical approach leverages pivots and the triangle inequality to enable efficient HSP search in general metric spaces. A key component of our approach is the concept of the shifted generalized hyperplane between two points, which allows for the invalidation of entire point groups. Our approach ensures the desired properties of the HSP Test with exactness even for datasets containing hundreds of millions of points.
October 11, 2023
Generalized Relative Neighborhood Graph (GRNG) for Similarity Search
Similarity search for information retrieval on a variety of datasets relies on a notion of neighborhood, frequently using binary relationships such as the kNN approach. We suggest, however, that the notion of a neighbor must recognize higher-order relationship, to capture neighbors in all directions. Proximity graphs, such as the Relative Neighbor Graphs (RNG), use trinary relationships which capture the notion of direction and have been successfully used in a number of applications. However, the current algorithms for computing the RNG, despite widespread use, are approximate and not scalable. This paper proposes a hierarchical approach and novel type of graph, the Generalized Relative Neighborhood Graph (GRNG) for use in a pivot layer that then guides the efficient and exact construction of the RNG of a set of exemplars. It also shows how to extend this to a multi-layer hierarchy which significantly improves over the state-of-the-art methods which can only construct an approximate RNG.
October 7, 2022