Demo: Implementing a Hybrid Search System
Build a complete hybrid search system with hands-on examples.
What You’ll Learn
- Step-by-step hybrid search implementation
- RRF algorithm in practice
- Performance optimization techniques
- Testing and evaluation methods
What You’ll Discover
In the previous lesson, you learned the theory behind hybrid search and the Universal Query API. Today you’ll implement it hands-on with a real dataset, comparing dense and sparse vector search and combining them using fusion algorithms.
You’ll learn to:
- Create collections with both dense and sparse named vectors
- Compare dense vs. sparse search behavior on real queries
- Implement Reciprocal Rank Fusion (RRF) for hybrid search
- Explore Distribution-Based Score Fusion (DBSF) as an alternative
- Understand the limitations and strengths of each approach
The Hybrid Search Challenge
Working with both semantic (dense) and lexical (sparse) search presents interesting challenges:
- Different scoring systems: Dense typically uses cosine similarity ([-1, 1]), sparse uses BM25 (unbounded)
- Different result sets: The same query may return completely different documents
- Vocabulary sensitivity: Sparse search can return fewer results or none if keywords don’t match
- User diversity: Some users know exact terms, others use natural language
Step 1: Environment Setup
Install Required Libraries
!pip install -q qdrant-client[fastembed]
Why the fastembed extra? This includes FastEmbed, which provides built-in models for generating both dense and sparse embeddings without additional dependencies. You won’t need separate libraries for OpenAI or other embedding providers.
Connect to Qdrant Cloud
Qdrant Cloud provides the persistence and performance needed for hybrid search experimentation:
from qdrant_client import QdrantClient
from google.colab import userdata
client = QdrantClient(
location="https://your-cluster-url.cloud.qdrant.io:6333",
api_key=userdata.get("api-key")
)
Using Google Colab secrets: The userdata.get() function accesses secrets stored in your Colab environment, similar to environment variables. This keeps your API key secure and out of your code.
Step 2: Create Collection with Named Vectors
For hybrid search, we need a collection that supports both sparse and dense vectors. Qdrant allows multiple named vectors per point:
from qdrant_client import models
# Define the collection name
collection_name = "hybrid_search_demo"
# Create our collection with both sparse (bm25) and dense vectors
client.create_collection(
collection_name=collection_name,
vectors_config={
"dense": models.VectorParams(
distance=models.Distance.COSINE,
size=384,
),
},
sparse_vectors_config={
"sparse": models.SparseVectorParams(
modifier=models.Modifier.IDF
)
}
)
Key configuration details:
- Named vectors:
"dense"and"sparse"identify each vector type - Dense configuration: 384 dimensions (matches sentence-transformers/all-MiniLM-L6-v2)
- Cosine distance: Typical choice for semantic similarity
- IDF modifier: Inverse Document Frequency weighting for BM25 sparse vectors
- Same collection: Both vectors exist on the same points for hybrid search
Step 3: Upload the Cheese Dataset
We’re using a small dataset of 10 documents describing different types of cheese and cheese-based dishes. This simple dataset makes it easy to observe the behavior of different search methods:
documents = [
"Aged Gouda develops a crystalline texture and nutty flavor profile after 18 months of maturation.",
"Mature Gouda cheese becomes grainy and develops a rich, buttery taste with extended aging.",
"Brie cheese features a soft, creamy interior surrounded by an edible white rind.",
"This French cheese has a flowing, buttery center encased in a bloomy white crust.",
"Fresh mozzarella pairs beautifully with ripe tomatoes and basil leaves.",
"Classic Margherita pizza topped with tomato sauce, mozzarella, and fresh basil.",
"Parmesan requires at least 12 months of cave aging to develop its signature sharp taste.",
"Parmigiano-Reggiano's distinctive piquant flavor comes from extended maturation in controlled environments.",
"Grilled cheese sandwiches are the ultimate American comfort food for cold winter days.",
"Croque Monsieur combines ham and Gruyère in France's answer to the toasted cheese sandwich.",
]
Now upload with both dense and sparse embeddings:
import uuid
client.upsert(
collection_name=collection_name,
points=[
models.PointStruct(
id=uuid.uuid4().hex,
vector={
"dense": models.Document(
text=doc,
model="sentence-transformers/all-MiniLM-L6-v2",
),
"sparse": models.Document(
text=doc,
model="Qdrant/bm25",
),
},
payload={"text": doc},
)
for doc in documents
]
)
About this approach:
- Document model: Automatically generates embeddings using specified models
- Dual embedding: Each point gets both dense and sparse representations
- Small dataset: 10 documents is perfect for observing search behavior differences
- No batching needed: For production with larger datasets, implement batching and retry logic
Step 4: Compare Dense vs. Sparse Search
Let’s create helper functions to test each search method independently. First, dense search:
def dense_search(query: str) -> list[models.ScoredPoint]:
response = client.query_points(
collection_name=collection_name,
query=models.Document(
text=query,
model="sentence-transformers/all-MiniLM-L6-v2",
),
using="dense",
limit=3,
)
return response.points
Now sparse search:
def sparse_search(query: str) -> list[models.ScoredPoint]:
response = client.query_points(
collection_name=collection_name,
query=models.Document(
text=query,
model="Qdrant/bm25",
),
using="sparse",
limit=3,
)
return response.points
Test Queries Across Both Methods
Now let’s run both methods on different query types:
queries = [
"nutty aged cheese",
"soft French cheese",
"pizza ingredients",
"a good lunch",
]
for query in queries:
print("Query:", query)
dense_results = dense_search(query)
print("Dense Results:")
for result in dense_results:
print("\t-", result.payload["text"], result.score)
sparse_results = sparse_search(query)
print("Sparse Results:")
for result in sparse_results:
print("\t-", result.payload["text"], result.score)
print()
Key Observations from Results
- Dense and sparse produce different rankings: For “nutty aged cheese”, sparse correctly identifies the exact match as #1, while dense ranks a semantically similar document higher
- Sometimes rankings match: For “soft French cheese”, both methods agree on the top results, but with different confidence scores
- Dense always returns expected results: Dense search will always return 3 results because any two vectors have some similarity, even if extremely low
- Sparse can return fewer results: For “pizza ingredients”, sparse only returns 1 result. For “a good lunch”, sparse returns 0 results due to vocabulary mismatch
- Vocabulary mismatch problem: When query terms don’t appear in documents, sparse search fails completely, while dense understands the semantic intent
Step 5: Hybrid Search with Reciprocal Rank Fusion
Now let’s combine both methods using RRF. This fusion algorithm doesn’t compare incompatible scores - it only uses the ranking order:
def rrf_search(query: str) -> list[models.ScoredPoint]:
response = client.query_points(
collection_name=collection_name,
prefetch=[
models.Prefetch(
query=models.Document(
text=query,
model="Qdrant/bm25",
),
using="sparse",
limit=3,
),
models.Prefetch(
query=models.Document(
text=query,
model="sentence-transformers/all-MiniLM-L6-v2",
),
using="dense",
limit=3,
)
],
query=models.FusionQuery(fusion=models.Fusion.RRF),
limit=3,
)
return response.points
How RRF works in this code:
- Prefetch from both: Retrieve top 3 from sparse AND dense search
- FusionQuery: Applies RRF algorithm to combine rankings
- Single API call: The entire hybrid pipeline executes in one request
- Result: Documents that perform well in both methods rank higher
RRF Results Analysis
for query in queries:
print("Query:", query)
rrf_results = rrf_search(query)
print("RRF Results:")
for result in rrf_results:
print("\t-", result.payload["text"], result.score)
print()
What to notice:
- Best of both worlds: Results include documents from both dense and sparse searches
- Ranking preservation: When both methods agree (like “soft French cheese”), RRF maintains the consensus
- Handles sparse gaps: When sparse returns fewer results (or none), dense search fills the gaps
- Balanced scoring: Documents ranked highly by both methods get boosted in RRF scores
Step 6: Distribution-Based Score Fusion (DBSF)
RRF isn’t the only fusion method available. DBSF normalizes scores from each query and sums them across different retrievers:
def dbsf_search(query: str) -> list[models.ScoredPoint]:
response = client.query_points(
collection_name=collection_name,
prefetch=[
models.Prefetch(
query=models.Document(
text=query,
model="Qdrant/bm25",
),
using="sparse",
limit=3,
),
models.Prefetch(
query=models.Document(
text=query,
model="sentence-transformers/all-MiniLM-L6-v2",
),
using="dense",
limit=3,
)
],
query=models.FusionQuery(fusion=models.Fusion.DBSF),
limit=3,
)
return response.points
DBSF Results
for query in queries:
print("Query:", query)
dbsf_results = dbsf_search(query)
print("DBSF Results:")
for result in dbsf_results:
print("\t-", result.payload["text"], result.score)
print()
Comparing DBSF to RRF:
- In this simple example, DBSF and RRF produce identical rankings for all queries
- This is NOT a general rule - different fusion methods can produce different results
- With larger datasets and more complex queries, the differences become more apparent
- DBSF considers score distributions, while RRF only uses rank positions
Evaluation Considerations
Did fusion improve search quality? We can’t definitively say without proper evaluation. Here’s why:
The Challenge of Search Quality
- Subjective relevance: “Best results” depend on unknown user intentions
- No ground truth: We don’t have a reference dataset defining expected outputs
- Context matters: Different users might prefer different results for the same query
Proper Evaluation Requires
- Ground truth dataset: Define expected results for each query
- Metrics: Use precision, recall, NDCG, or other relevance metrics
- User feedback: Collect real user satisfaction data
- A/B testing: Compare different strategies in production
For this demo: We’re “eyeballing” results to understand behavior, but production systems need rigorous evaluation frameworks.
Summary & Key Takeaways
What you’ve built: A complete hybrid search pipeline using Qdrant’s Universal Query API that combines dense semantic search with sparse keyword search in a single call.
Key insights:
- Dense vs. Sparse behavior: Dense always returns results (semantic), sparse can return none (keyword match)
- Fusion solves incompatibility: RRF and DBSF combine rankings without comparing incompatible scores
- Single API call: The Universal Query API makes complex pipelines simple
- Complementary strengths: Dense handles vague queries, sparse handles exact matches
- Evaluation matters: Proper testing requires ground truth datasets and metrics
Production recommendations:
- Start with RRF - it’s simple and effective
- Test DBSF if you need score-distribution awareness
- Build evaluation datasets for your specific domain
- Monitor user satisfaction metrics
- Consider adding specialized rerankers for even better quality
Next Steps & Resources
What’s next:
- Experiment with parameters: Adjust prefetch limits, try different embedding models
- Add reranking: Explore more complex models for final reranking stage, such as cross-encoders
- Build ground truth: Create evaluation datasets for your use case
- Test on your data: Apply these techniques to domain-specific datasets
Additional resources:
- Qdrant Documentation: Hybrid Search - Complete technical reference
- Universal Query API Guide - Advanced usage patterns
Ready for the next challenge? You’ve mastered hybrid search fundamentals. These same techniques scale to millions of documents and power production search systems!