Project: HNSW Performance Benchmarking
Now that you’ve seen how HNSW parameters and payload indexes affect performance with the DBpedia dataset, it’s time to optimize for your own domain and use case.
Your Mission
Build on your Day 1 search engine by adding performance optimization. You’ll discover which HNSW settings work best for your specific data and queries, and measure the real impact of payload indexing.
Estimated Time: 90 minutes
What You’ll Build
A performance-optimized version of your Day 1 search engine that demonstrates:
- Fast bulk load: Load with
m=0, then switch to HNSW - HNSW parameter tuning: Try different
mandef_construct - Payload indexing impact: Time filtering with and without indexes
- Domain findings: What works best for your content
Setup
Prerequisites
- Qdrant Cloud cluster (URL + API key)
- Python 3.9+ (or Google Colab)
- Packages:
qdrant-client,sentence-transformers,numpy
Models
- Embeddings:
sentence-transformers/all-MiniLM-L6-v2(384-dim)
Dataset
- Reuse your Day 1 domain data or prepare a dataset with 1,000+ items and a rich text field (e.g.,
description). - Include a few numeric fields for filtering (e.g.,
length,word_count) so payload indexing impact can be measured.
Build Steps
Step 1: Extend Your Day 1 Project
Start with your domain search engine from Day 1, or create a new one with 1000+ items:
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer
import time
import numpy as np
import os
client = QdrantClient(url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY"))
# For Colab:
# from google.colab import userdata
# client = QdrantClient(url=userdata.get("QDRANT_URL"), api_key=userdata.get("QDRANT_API_KEY"))
encoder = SentenceTransformer("all-MiniLM-L6-v2")
Step 2: Create Multiple Test Collections
Test different HNSW configurations to find what works best:
# Test configurations
configs = [
{"name": "fast_initial_upload", "m": 0, "ef_construct": 100}, # m=0 = ingest-only
{"name": "memory_optimized", "m": 8, "ef_construct": 100}, # m=8 = lower RAM
{"name": "balanced", "m": 16, "ef_construct": 200}, # m=16 = balanced
{"name": "high_quality", "m": 32, "ef_construct": 400}, # m=32 = higher recall, slower build
]
for config in configs:
collection_name = f"my_domain_{config['name']}"
if client.collection_exists(collection_name=collection_name):
client.delete_collection(collection_name=collection_name)
client.create_collection(
collection_name=collection_name,
vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE),
hnsw_config=models.HnswConfigDiff(
m=config["m"],
ef_construct=config["ef_construct"],
full_scan_threshold=10, # force HNSW instead of full scan
),
optimizers_config=models.OptimizersConfigDiff(
indexing_threshold=10
), # Force indexing even on small sets for demo
)
print(f"Created collection: {collection_name}")
Step 3: Upload and Time
Measure upload performance for each configuration:
def upload_with_timing(collection_name, data, config_name):
embeddings = encoder.encode([d["description"] for d in data], show_progress_bar=True).tolist()
points = []
for i, item in enumerate(data):
embedding = embeddings[i]
points.append(
models.PointStruct(
id=i,
vector=embedding,
payload={
**item,
"length": len(item["description"]),
"word_count": len(item["description"].split()),
"has_keywords": any(
keyword in item["description"].lower() for keyword in ["important", "key", "main"]
),
},
)
)
# Warmup
client.query_points(collection_name=collection_name, query=points[0].vector, limit=1)
start_time = time.time()
client.upload_points(collection_name=collection_name, points=points)
upload_time = time.time() - start_time
print(f"{config_name}: Uploaded {len(points)} points in {upload_time:.2f}s")
return upload_time
# Load your dataset here. The larger the dataset, the more accurate the benchmark will be.
# your_dataset = [{"description": "This is a description of a product"}, ...]
# Upload to each collection
upload_times = {}
for config in configs:
collection_name = f"my_domain_{config['name']}"
upload_times[config["name"]] = upload_with_timing(collection_name, your_dataset, config["name"])
def wait_for_indexing(collection_name, timeout=60, poll_interval=1):
print(f"Waiting for collection '{collection_name}' to be indexed...")
start_time = time.time()
while time.time() - start_time < timeout:
info = client.get_collection(collection_name=collection_name)
if info.indexed_vectors_count > 0 and info.status == models.CollectionStatus.GREEN:
print(f"Success! Collection '{collection_name}' is indexed and ready.")
print(f" - Status: {info.status.value}")
print(f" - Indexed vectors: {info.indexed_vectors_count}")
return
print(f" - Status: {info.status.value}, Indexed vectors: {info.indexed_vectors_count}. Waiting...")
time.sleep(poll_interval)
info = client.get_collection(collection_name=collection_name)
raise Exception(
f"Timeout reached after {timeout} seconds. Collection '{collection_name}' is not ready. "
f"Final status: {info.status.value}, Indexed vectors: {info.indexed_vectors_count}"
)
for config in configs:
if config["m"] > 0: # m=0 has no HNSW to wait for
collection_name = f"my_domain_{config['name']}"
wait_for_indexing(collection_name)
Step 4: Benchmark Search Performance
Test search speed with different hnsw_ef values:
def benchmark_search(collection_name, query_embedding, ef_values=[64, 128, 256]):
# Warmup
client.query_points(collection_name=collection_name, query=query_embedding, limit=1)
# hnsw_ef: higher = better recall, but slower. Tune per your latency goal.
results = {}
for hnsw_ef in ef_values:
times = []
# Run multiple queries for more reliable timing
for _ in range(25):
start_time = time.time()
_ = client.query_points(
collection_name=collection_name,
query=query_embedding,
limit=10,
search_params=models.SearchParams(hnsw_ef=hnsw_ef),
with_payload=False,
)
times.append((time.time() - start_time) * 1000)
results[hnsw_ef] = {
"avg_time": np.mean(times),
"min_time": np.min(times),
"max_time": np.max(times),
}
return results
test_query = "your test query"
query_embedding = encoder.encode(test_query).tolist()
performance_results = {}
for config in configs:
if config["m"] > 0: # Skip m=0 collections for search
collection_name = f"my_domain_{config['name']}"
performance_results[config["name"]] = benchmark_search(
collection_name, query_embedding
)
Step 5: Measure Payload Indexing Impact
Measure filtering performance with and without indexes:
def test_filtering_performance(collection_name):
query_embedding = encoder.encode("your filter test query").tolist()
# Test filter without index
filter_condition = models.Filter(
must=[models.FieldCondition(key="length", range=models.Range(gte=10, lte=200))]
)
# Demo only: unindexed_filtering_retrieve=True forces a scan; turn it off right after measuring.
client.update_collection(
collection_name=collection_name,
strict_mode_config=models.StrictModeConfig(unindexed_filtering_retrieve=True),
)
# Warmup
client.query_points(collection_name=collection_name, query=query_embedding, limit=1)
# Timing without payload index
times = []
for _ in range(25):
start_time = time.time()
_ = client.query_points(
collection_name=collection_name,
query=query_embedding,
query_filter=filter_condition,
limit=10,
with_payload=False,
)
times.append((time.time() - start_time) * 1000)
time_without_index = np.mean(times)
# Create payload index
client.create_payload_index(
collection_name=collection_name,
field_name="length",
field_schema=models.PayloadSchemaType.INTEGER,
wait=True,
)
# HNSW was already built; adding the payload index doesn’t rebuild it.
# Bump ef_construct (+1) once to trigger a safe rebuild.
base_ef = client.get_collection(
collection_name=collection_name
).config.hnsw_config.ef_construct
new_ef_construct = base_ef + 1
client.update_collection(
collection_name=collection_name,
hnsw_config=models.HnswConfigDiff(ef_construct=new_ef_construct),
strict_mode_config=models.StrictModeConfig(
unindexed_filtering_retrieve=False
), # Turn off scanning and use payload index instead.
)
wait_for_indexing(collection_name)
# Warmup
client.query_points(collection_name=collection_name, query=query_embedding, limit=1)
# Timing with index
times = []
for _ in range(25):
start_time = time.time()
_ = client.query_points(
collection_name=collection_name,
query=query_embedding,
query_filter=filter_condition,
limit=10,
with_payload=False,
)
times.append((time.time() - start_time) * 1000)
time_with_index = np.mean(times)
return {
"without_index": time_without_index,
"with_index": time_with_index,
"speedup": time_without_index / time_with_index,
}
# Test on your best performing collection
best_collection = "my_domain_balanced" # Choose based on your results
filtering_results = test_filtering_performance(best_collection)
Step 6: Analyze Your Results
Create a summary of your findings:
print("=" * 60)
print("PERFORMANCE OPTIMIZATION RESULTS")
print("=" * 60)
print("\n1) Upload Performance:")
for config_name, time_taken in upload_times.items():
print(f" {config_name}: {time_taken:.2f}s")
print("\n2) Search Performance (hnsw_ef=128):")
for config_name, results in performance_results.items():
if 128 in results:
print(f" {config_name}: {results[128]['avg_time']:.2f}ms")
print("\n3) Filtering Impact:")
print(f" Without index: {filtering_results['without_index']:.2f}ms")
print(f" With index: {filtering_results['with_index']:.2f}ms")
print(f" Speedup: {filtering_results['speedup']:.1f}x")
Success Criteria
You’ll know you’ve succeeded when:
You’ve tested multiple HNSW configurations with real timing data
You can explain which settings work best for your domain and why
You’ve measured the concrete impact of payload indexing
You have clear recommendations for production deployment
Share Your Discovery
Step 1: Reflect on Your Findings
- Which HNSW configuration (
m,ef_construct) worked best for your domain? - How did the balance between upload time and search speed look?
- What was the impact of adding a payload index?
- How do your results compare to the DBpedia demo?
Step 2: Post Your Results
Post your results in using this:
**[Day 2] HNSW Performance Benchmarking**
**High-Level Summary**
- **Domain:** "[your domain]"
- **Key Result:** "m=[..], ef_construct=[..], hnsw_ef=[..] gave [X] ms search and [Y] s upload (best balance)."
**Reproducibility**
- **Collections:** ...
- **Model:** sentence-transformers/all-MiniLM-L6-v2 (384-dim)
- **Dataset:** [N items] (snapshot: YYYY-MM-DD)
**Configuration Results**
| m | ef_construct | Upload_s | Search_ms@ef=128 |
|----|--------------|----------|------------------|
| 0 | 100 | X.X | — |
| 8 | 100 | Y.Y | A.A |
| 16 | 200 | Z.Z | B.B |
| 32 | 400 | W.W | C.C |
**Filtering Impact**
- Payload index on `length`: **[speedup]×**
Without index: [T1] ms → With index: [T2] ms
**Recommendations**
- Best config for this domain: [m, ef_construct, hnsw_ef]
- When to pick another setting: [short guidance]
- Notes for production: [one line on indexing order / filters]
**Surprise**
- "[one unexpected finding]"
**Next Step**
- "[one concrete action you’ll try next]"
Optional: Go Further
Test more granular parameters:
- ef_construct impact on recall & build time
- hnsw_ef per-query tuning by complexity
Track memory usage differences (RAM/on-disk, payload indexes)
Add accuracy metrics vs. a small labeled query set to see if higher
mtruly improves quality for your domain