In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Vector Databases > Vector Database Index Types - HNSW vs IVF

Vector Database Index Types - HNSW vs IVF

Author: Venkata Sudhakar

Choosing the right index type is the most important performance decision when building a vector database. Flat indexes (brute force) guarantee 100% recall but take O(n) time per query - acceptable for thousands of vectors but unusable at millions. Approximate Nearest Neighbour (ANN) indexes like HNSW and IVF trade a small accuracy loss for orders-of-magnitude faster query times, enabling sub-millisecond search across millions of product embeddings.

HNSW (Hierarchical Navigable Small World) builds a graph structure that enables logarithmic search time. It delivers the best query speed and recall but requires significant memory for the graph structure. IVF (Inverted File Index) partitions vectors into clusters and only searches the nearest clusters at query time - lower memory usage but slightly slower and less accurate than HNSW. IVF+PQ (Product Quantisation) compresses vectors further, enabling billion-scale indexes on a single machine.

The below example benchmarks Flat, IVF, and HNSW indexes using FAISS on a simulated ShopMax India product catalogue to show the speed and recall tradeoffs at different scales.

import faiss
import numpy as np
import time

np.random.seed(42)
d = 384        # embedding dimension
n = 100000     # 100k product embeddings
k = 5          # top-k results

# Generate synthetic product embeddings
data = np.random.randn(n, d).astype(np.float32)
faiss.normalize_L2(data)
query = np.random.randn(1, d).astype(np.float32)
faiss.normalize_L2(query)

# Flat index (exact, baseline)
flat = faiss.IndexFlatIP(d)
flat.add(data)
t0 = time.time()
D_flat, I_flat = flat.search(query, k)
flat_ms = round((time.time() - t0) * 1000, 2)

# IVF index (approximate, cluster-based)
nlist = 100
quantiser = faiss.IndexFlatIP(d)
ivf = faiss.IndexIVFFlat(quantiser, d, nlist, faiss.METRIC_INNER_PRODUCT)
ivf.train(data)
ivf.add(data)
ivf.nprobe = 10
t0 = time.time()
D_ivf, I_ivf = ivf.search(query, k)
ivf_ms = round((time.time() - t0) * 1000, 2)

# HNSW index (graph-based, fastest)
hnsw = faiss.IndexHNSWFlat(d, 32)
hnsw.add(data)
t0 = time.time()
D_hnsw, I_hnsw = hnsw.search(query, k)
hnsw_ms = round((time.time() - t0) * 1000, 2)

# Recall: how many of the flat results appear in each ANN result
def recall(true_ids, approx_ids):
    return len(set(true_ids) & set(approx_ids)) / len(true_ids)

print(f"Index      | Latency (ms) | Recall@{k}")
print("-" * 40)
print(f"Flat       | {flat_ms:12} | 1.000 (exact)")
print(f"IVF        | {ivf_ms:12} | {recall(I_flat[0], I_ivf[0]):.3f}")
print(f"HNSW       | {hnsw_ms:12} | {recall(I_flat[0], I_hnsw[0]):.3f}")

It gives the following output,

Index      | Latency (ms) | Recall@5
----------------------------------------
Flat       |        38.21 | 1.000 (exact)
IVF        |         1.84 | 0.800
HNSW       |         0.43 | 1.000

HNSW is 89x faster than Flat with perfect recall at this scale. IVF is 20x faster but misses 1 in 5 results with nprobe=10 - increase nprobe to improve recall at the cost of speed. For ShopMax India, use HNSW for catalogues up to 10 million products where memory allows (HNSW uses roughly 1.5GB per million 384-dim vectors). Use IVF+PQ for catalogues above 10 million products where memory is the binding constraint.

Send your comments, suggestions or queries regarding this site to [email protected].