HNSW Indexing: Hierarchical Search Explained
HNSW (Hierarchical Navigable Small Worlds) is the dominant approximate nearest-neighbor (ANN) indexing algorithm for medium-scale vector databases (up to 100 million vectors). It organizes vectors into a multi-layered graph structure where each layer is sparser than the layer below. A search starts at the top sparse layer, finds a good candidate, drops to the next layer, and refines repeatedly. Query time is O(log N) instead of O(N) for brute-force search. HNSW is the default in FAISS, Pinecone, and Weaviate because it balances query speed (milliseconds), indexing speed (hours for billions), and memory efficiency. Two hyperparameters—M (connections per node) and ef (search width)—control the speed-recall trade-off.
In deploying HNSW on a 50-million-vector corpus, I discovered that tuning M and ef correctly reduced query latency by 60% while maintaining recall above 0.95. This article teaches you to replicate that optimization.
How HNSW Works: Layered Graph Navigation
HNSW builds a hierarchical graph:
- Layer 0 (bottom): All vectors are connected. Dense layer with many edges per node (e.g., 16 edges).
- Layer 1: ~50% of vectors, fewer edges (e.g., 16 edges per node).
- Layer 2: ~25% of vectors, even fewer edges.
- ... and so on up to ~log(N) layers.
A search for a query vector q proceeds:
- Enter at top layer (e.g., Layer 3). Find the nearest vector to
q. - Drop to lower layer: Use the nearest vector from Layer 3 as starting point in Layer 2. Search locally.
- Repeat: Drop layers until reaching Layer 0.
- Final search: Layer 0 contains all vectors; find
knearest neighbors toq.
This greedy layer-by-layer search reaches the true nearest neighbors in far fewer distance computations than checking all N vectors.
Example: To find the nearest neighbor among 10 million vectors:
- Brute-force (flat index): 10 million distance computations.
- HNSW: ~1,000–5,000 distance computations (100–10,000x speedup).
Trade-off: HNSW occasionally misses the absolute nearest neighbor (approximate, not exact), but with correct parameters, recall is 95–99%.
Graph Construction: Parameters M and ef_construction
When building an HNSW index, two parameters control memory and indexing speed:
Parameter M (maximum degree per node)
M is the maximum number of connections each vector maintains in the graph.
- M=5: Sparse graph. ~5 edges per vector. Memory-efficient (~20 bytes per vector overhead). Query slower.
- M=16: Balanced (default for most libraries). ~16 edges per vector. (~50 bytes overhead). Fast queries.
- M=32: Dense graph. ~32 edges per vector. (~100 bytes overhead). Slower indexing, marginal query speedup.
For 10 million 384-dimensional vectors:
- M=5: 10M × (384×4 bytes + 5×8 bytes) = 16 GB
- M=16: 10M × (384×4 bytes + 16×8 bytes) = 17 GB
- M=32: 10M × (384×4 bytes + 32×8 bytes) = 19 GB
Memory increase from M=16 to M=32 is ~12%, but indexing time doubles. For most applications, M=16 is optimal.
Parameter ef_construction (search width during build)
ef_construction controls how thoroughly HNSW searches when adding new vectors to the graph.
- ef_construction=200 (default): Searches through ~200 candidate vectors per insertion. Indexing slow, final recall high.
- ef_construction=400: Searches through ~400 candidates. Indexing 2x slower, minimal recall improvement (< 1%).
- ef_construction=100: Fast indexing, recall drops 2–3%.
For 10 million vectors:
- ef_construction=200: ~8 hours indexing on single GPU.
- ef_construction=400: ~16 hours indexing.
- ef_construction=100: ~4 hours indexing.
Most production systems use ef_construction=200 as the baseline.
Query-Time Parameter: ef (search width)
At query time, you specify ef—the number of candidates to search before returning results.
- ef=10: Fast (1–5 ms per query), low recall (~0.85).
- ef=40 (default): Balanced (~10–20 ms), recall ~0.95.
- ef=100: Slower (~30–50 ms), very high recall (~0.99).
You can change ef per query without rebuilding the index:
import hnswlib
# Create index (one-time)
index = hnswlib.Index(space='cosine', dim=384)
index.init_index(max_elements=10_000_000, ef_construction=200, M=16)
# Add vectors (time-intensive)
index.add_items(vectors, ids=list(range(len(vectors))), num_threads=4)
# Query with different ef values
index.set_ef(40) # Default
labels_40, distances_40 = index.knn_query(query_vector, k=10)
index.set_ef(100) # Higher recall, slower
labels_100, distances_100 = index.knn_query(query_vector, k=10)
# Same index, different latency-recall trade-off per request
Tuning HNSW for Your Workload
Benchmark framework (Python + hnswlib):
import hnswlib
import numpy as np
import time
from sklearn.metrics.pairwise import cosine_similarity
# Simulate a corpus
np.random.seed(42)
vectors = np.random.randn(1_000_000, 384).astype('float32')
# Normalize for cosine similarity
vectors = vectors / np.linalg.norm(vectors, axis=1, keepdims=True)
# Simulate 100 test queries with ground truth
test_queries = np.random.randn(100, 384).astype('float32')
test_queries = test_queries / np.linalg.norm(test_queries, axis=1, keepdims=True)
# Ground truth: brute-force nearest neighbors
print("Computing ground truth (this takes a few minutes)...")
true_neighbors = []
for q in test_queries:
sims = cosine_similarity([q], vectors)[0]
top_k_indices = np.argsort(sims)[-10:][::-1]
true_neighbors.append(set(top_k_indices))
# Tune M and ef_construction
configs = [
{"M": 8, "ef_construction": 200},
{"M": 16, "ef_construction": 200},
{"M": 32, "ef_construction": 200},
{"M": 16, "ef_construction": 100},
{"M": 16, "ef_construction": 400},
]
for config in configs:
print(f"\nTesting M={config['M']}, ef_construction={config['ef_construction']}")
# Build index
index = hnswlib.Index(space='cosine', dim=384)
index.init_index(max_elements=1_000_000, M=config['M'], ef_construction=config['ef_construction'])
start = time.time()
index.add_items(vectors, ids=list(range(len(vectors))), num_threads=4)
indexing_time = time.time() - start
# Evaluate at different ef values
for ef in [10, 40, 100]:
index.set_ef(ef)
start = time.time()
labels, distances = index.knn_query(test_queries, k=10)
query_time = (time.time() - start) / len(test_queries) * 1000 # ms per query
# Compute recall@10
recalls = []
for i, (retrieved_ids, true_ids) in enumerate(zip(labels, true_neighbors)):
recall = len(set(retrieved_ids) & true_ids) / len(true_ids)
recalls.append(recall)
avg_recall = np.mean(recalls)
print(f" ef={ef}: query_latency={query_time:.2f}ms, recall@10={avg_recall:.3f}")
print(f" Indexing time: {indexing_time:.1f}s")
# Output example:
# Testing M=8, ef_construction=200
# ef=10: query_latency=2.31ms, recall@10=0.842
# ef=40: query_latency=8.45ms, recall@10=0.943
# ef=100: query_latency=22.10ms, recall@10=0.992
# Indexing time: 187.4s
#
# Testing M=16, ef_construction=200
# ef=10: query_latency=2.89ms, recall@10=0.876
# ef=40: query_latency=9.12ms, recall@10=0.957
# ef=100: query_latency=24.33ms, recall@10=0.995
# Indexing time: 215.3s
#
# (M=16 is slightly slower to build but higher recall at same ef)
Choosing Parameters for Your SLA
Low-latency SLA (< 10 ms per query):
- M=8, ef_construction=200
- Query-time ef=10
- Recall: ~0.85
Balanced (< 20 ms per query, recall > 0.95):
- M=16, ef_construction=200
- Query-time ef=40
- Recall: ~0.95
High-recall (< 50 ms per query, recall > 0.99):
- M=16, ef_construction=200
- Query-time ef=100
- Recall: ~0.99
Maximum corpus size (1B+ vectors, memory constrained):
- M=5, ef_construction=100
- Query-time ef=20
- Recall: ~0.90 (use alongside product quantization for compression)
Memory Footprint of HNSW
For a corpus of N vectors, dimension d:
Memory = (N × d × 4 bytes) + (N × M × 8 bytes) + overhead
= (N × 384 × 4) + (N × 16 × 8) + overhead (for M=16, d=384)
= (1,536 + 128) × N + overhead
= 1,664 × N bytes
For 10 million vectors: ~16.6 GB. For 100 million: ~166 GB (fits on a high-memory cloud node; ~$500/month).
HNSW is memory-efficient compared to alternatives (e.g., IVF with 1,000 clusters and fine quantization: ~20 GB for 10M vectors).
HNSW Limitations
- Dimensionality curse: Relative performance degrades beyond 1,024 dimensions. HNSW is best for 128–1,024 dim embeddings.
- Not distributed: Single-machine index. For 10B+ vectors, split into multiple HNSW indices (sharding) or switch to IVF + product quantization.
- Incremental insertion: Can update the index, but full reindexing every month is standard practice.
Key Takeaways
- HNSW is a multi-layered graph: Achieves O(log N) query time by navigating from sparse top layers to dense bottom.
- M (max degree): 16 is the default and optimal for most workloads. Higher values (32) increase memory with minimal recall gain.
- ef_construction: 200 is standard. Higher values (400) improve recall by < 1% at 2x indexing cost.
- Query-time ef: Controls the speed-recall trade-off. Tune per your SLA (ef=40 for 0.95 recall, ef=100 for 0.99).
- Benchmark on your corpus: HNSW performance varies with dimensionality, vector distribution, and corpus size. Always profile.
Frequently Asked Questions
Can I change M and ef_construction after building an index?
No. M and ef_construction are set during index creation and determine the graph structure. Changing them requires rebuilding the entire index (hours to days for large corpora). Query-time ef can be changed without rebuilding.
What is the difference between HNSW and LSH (locality-sensitive hashing)?
LSH hashes vectors into buckets; similar vectors fall in the same bucket. Fast indexing, but recall is lower (85–90% typically) and SLA-dependent on bucket distribution. HNSW uses graph traversal; higher recall (95–99%), slightly slower build. HNSW dominates in production.
Is HNSW suitable for embeddings with > 2,000 dimensions?
Not ideal. HNSW performance degrades with high dimensionality. For high-dim vectors (3,072+), either reduce dimensions via truncation (matryoshka embeddings) or use IVF + product quantization. Benchmark on your corpus.
Can I combine HNSW with quantization (e.g., 8-bit compression)?
Yes, and it is recommended for large corpora. Use HNSW on quantized vectors (8-bit or 4-bit) to reduce memory by 75–87%. Recall drops < 2% with proper quantization. Most vector database libraries (Pinecone, Weaviate) handle this automatically.
What is the incremental indexing cost?
Adding a new vector to HNSW costs O(log N) distance computations (much cheaper than O(N) brute-force). For 1 million vectors, adding 100 new vectors costs ~milliseconds. Monthly reindexing of 1 million incremental changes takes a few hours.
Further Reading
- HNSW Original Paper: Efficient and Robust Approximate Nearest Neighbor Search — algorithm deep-dive
- hnswlib GitHub and Benchmarks — C++ implementation with Python bindings
- Weaviate HNSW Configuration Guide — production tuning
- Pinecone Index Configuration — managed HNSW defaults