Qdrant Hybrid Search with Reranking

Time: 40 minLevel: Intermediate

Reranking is a powerful technique for improving search precision: rather than running an expensive model over your entire corpus, you apply it to a smaller set of candidates already retrieved by a faster method. This keeps latency low while surfacing the most relevant results.

Reranking pairs especially well with hybrid search, which casts a wide retrieval net, maximizing recall across several retrieval paths. Reranking can sort the hybrid search results with a deeper relevance signal. A late interaction model, for instance, represents both query and document as multiple vectors, enabling more nuanced term-level comparisons than a single embedding can capture.

In this tutorial, you’ll learn how to build a hybrid search engine that uses dense embeddings for semantic search, sparse embeddings for keyword search, and late interaction embeddings for reranking. The result is a powerful search engine that delivers highly relevant results by combining the strengths of different embedding types.

You’ll use Qdrant Cloud Inference to generate vector embeddings. The three embedding models used in this tutorial (dense, sparse, and late interaction) are available free of charge on Qdrant Cloud. If you prefer to manage your own embedding infrastructure, you can apply the same principles, but you will need to adapt the code examples to use your embedding service.

Overview

Let’s start by breaking down the architecture:

Ingestion Stage

Processing dense, sparse, and late interaction embeddings in Qdrant

You’ll start by ingesting a CSV file containing information about science fiction books. Each row is a document, corresponding to a book, with fields for the title, author, and description. Each book description will be processed to generate three types of embeddings:

  • Dense embeddings capture the deeper, semantic meanings behind the text.
  • Sparse embeddings support more traditional, keyword-based methods. Specifically, you’ll use BM25, a probabilistic retrieval model. BM25 ranks documents based on how relevant their terms are to a given query, taking into account how often terms appear, document length, and how common the term is across all documents. It’s perfect for keyword-heavy searches.
  • Late interaction embeddings capture the nuanced interactions between query and document terms. You’ll use a ColBERT model, which uses a two-stage approach. First, it generates contextualized embeddings for both queries and documents using BERT, and then it performs late interaction, matching those embeddings efficiently to fine-tune relevance. Learn more about late interaction models in the Multivector Representations for Reranking in Qdrant tutorial and the Multi-Vector Search course.

The data, including all the embeddings, is stored in Qdrant, a vector search engine. This enables you to efficiently search, retrieve, and rerank your documents based on multiple layers of relevance.

Retrieval Stage

Query retrieval and reranking process in Qdrant

When a user submits a query, it is, just like documents, transformed into each of the types of embeddings: dense for semantic search, sparse for keyword search, and late interaction for precise reranking.

Next, hybrid search uses dense and sparse embeddings to find the most relevant documents. The dense embeddings are used for semantic search, while the sparse embeddings are used for keyword search. The resulting sets of documents are then reranked using late interaction embeddings, giving results that are not only relevant but also tuned to your query by prioritizing the documents that truly meet the user’s intent.

Implementation

Install and Initialize the Qdrant Client

First, install the Qdrant client:

pythontypescriptrustjavacsharpgo
qdrant-client
qdrant/js-client-rest
qdrant-client
io.qdrant:client
Qdrant.Client
github.com/qdrant/go-client

Next, initialize the client:

pythontypescriptrustjavacsharpgo
from qdrant_client import QdrantClient

client = QdrantClient(
    url="https://xyz-example.eu-central.aws.cloud.qdrant.io:6333",
    api_key="<your-api-key>",
    cloud_inference=True,
)
const client = new QdrantClient({
    url: QDRANT_URL,
    apiKey: QDRANT_API_KEY,
});
let client = Qdrant::from_url(qdrant_url)
    .api_key(qdrant_api_key)
    .build()?;
QdrantClient client =
    new QdrantClient(
        QdrantGrpcClient.newBuilder(QDRANT_URL, 6334, true)
            .withApiKey(QDRANT_API_KEY)
            .build());
var client = new QdrantClient(
	host: QDRANT_URL,
	https: true,
	apiKey: QDRANT_API_KEY
);
client, err := qdrant.NewClient(&qdrant.Config{
	Host:   QDRANT_URL,
	APIKey: QDRANT_API_KEY,
	UseTLS: true,
})

Models

Next, define the three embedding models. You’ll use the 384-dimensional sentence-transformers/all-MiniLM-L6-v2 model for dense embeddings, the qdrant/bm25 model for sparse embeddings, and the 96-dimensional answerdotai/answerai-colbert-small-v1 multivector model for late interaction embeddings.

pythontypescriptrustjavacsharpgo
dense_embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
sparse_embedding_model = "qdrant/bm25"
late_interaction_embedding_model = "answerdotai/answerai-colbert-small-v1"
const denseEmbeddingModel = "sentence-transformers/all-MiniLM-L6-v2";
const sparseEmbeddingModel = "qdrant/bm25";
const lateInteractionEmbeddingModel = "answerdotai/answerai-colbert-small-v1";
let dense_embedding_model = "sentence-transformers/all-MiniLM-L6-v2";
let sparse_embedding_model = "qdrant/bm25";
let late_interaction_embedding_model = "answerdotai/answerai-colbert-small-v1";
String denseEmbeddingModel = "sentence-transformers/all-MiniLM-L6-v2";
String sparseEmbeddingModel = "qdrant/bm25";
String lateInteractionEmbeddingModel = "answerdotai/answerai-colbert-small-v1";
string denseEmbeddingModel = "sentence-transformers/all-MiniLM-L6-v2";
string sparseEmbeddingModel = "qdrant/bm25";
string lateInteractionEmbeddingModel = "answerdotai/answerai-colbert-small-v1";
denseEmbeddingModel := "sentence-transformers/all-MiniLM-L6-v2"
sparseEmbeddingModel := "qdrant/bm25"
lateInteractionEmbeddingModel := "answerdotai/answerai-colbert-small-v1"

Create Collection

Create a new collection called hybrid-search, configured to handle the three vector types:

  • Dense embeddings (dense) using cosine distance for semantic comparisons.
  • Late interaction embeddings (multi) using cosine distance, with a multivector configuration using the maximum similarity comparator. Note the m=0 configuration to disable HNSW indexing. These embeddings are used for reranking, not ANN retrieval, so an HNSW index is not needed.
  • Sparse embeddings (sparse) for keyword-based searches using the IDF modifier.
pythontypescriptrustjavacsharpgo
from qdrant_client.models import Distance, VectorParams, models

collection_name = "hybrid-search"

if client.collection_exists(collection_name=collection_name):
    client.delete_collection(collection_name=collection_name)

client.create_collection(
    collection_name,
    vectors_config={
        "dense": models.VectorParams(
            size=384,
            distance=models.Distance.COSINE,
        ),
        "multi": models.VectorParams(
            size=96,
            distance=models.Distance.COSINE,
            multivector_config=models.MultiVectorConfig(
                comparator=models.MultiVectorComparator.MAX_SIM,
            ),
            hnsw_config=models.HnswConfigDiff(m=0)  #  Disable HNSW for reranking
        ),
    },
    sparse_vectors_config={
        "sparse": models.SparseVectorParams(modifier=models.Modifier.IDF)
    }
)
const collectionName = "hybrid-search";

if (await client.collectionExists(collectionName)) {
    await client.deleteCollection(collectionName);
}

await client.createCollection(collectionName, {
    vectors: {
        dense: {
            size: 384,
            distance: "Cosine",
        },
        multi: {
            size: 96,
            distance: "Cosine",
            multivector_config: { comparator: "max_sim" },
            hnsw_config: { m: 0 }, // Disable HNSW for reranking
        },
    },
    sparse_vectors: {
        sparse: { modifier: "idf" },
    },
});
let collection_name = "hybrid-search";

if client.collection_exists(collection_name).await? {
    client.delete_collection(collection_name).await?;
}

let mut vectors = VectorsConfigBuilder::default();
vectors.add_named_vector_params(
    "dense",
    VectorParamsBuilder::new(384, Distance::Cosine),
);
vectors.add_named_vector_params(
    "multi",
    VectorParamsBuilder::new(96, Distance::Cosine)
        .multivector_config(MultiVectorConfigBuilder::new(MultiVectorComparator::MaxSim))
        .hnsw_config(HnswConfigDiffBuilder::default().m(0)), // Disable HNSW for reranking
);

let mut sparse = SparseVectorsConfigBuilder::default();
sparse.add_named_vector_params(
    "sparse",
    SparseVectorParamsBuilder::default().modifier(Modifier::Idf),
);

client
    .create_collection(
        CreateCollectionBuilder::new(collection_name)
            .vectors_config(vectors)
            .sparse_vectors_config(sparse),
    )
    .await?;
String collectionName = "hybrid-search";

if (client.collectionExistsAsync(collectionName).get()) {
    client.deleteCollectionAsync(collectionName).get();
}

client.createCollectionAsync(
    CreateCollection.newBuilder()
        .setCollectionName(collectionName)
        .setVectorsConfig(
            VectorsConfig.newBuilder()
                .setParamsMap(
                    VectorParamsMap.newBuilder()
                        .putMap(
                            "dense",
                            VectorParams.newBuilder()
                                .setSize(384)
                                .setDistance(Distance.Cosine)
                                .build())
                        .putMap(
                            "multi",
                            VectorParams.newBuilder()
                                .setSize(96)
                                .setDistance(Distance.Cosine)
                                .setMultivectorConfig(
                                    MultiVectorConfig.newBuilder()
                                        .setComparator(MultiVectorComparator.MaxSim)
                                        .build())
                                .setHnswConfig(
                                    HnswConfigDiff.newBuilder()
                                        .setM(0) // Disable HNSW for reranking
                                        .build())
                                .build())
                        .build()))
        .setSparseVectorsConfig(
            SparseVectorConfig.newBuilder()
                .putMap(
                    "sparse",
                    SparseVectorParams.newBuilder()
                        .setModifier(Modifier.Idf)
                        .build())
                .build())
        .build()
).get();
string collectionName = "hybrid-search";

if (await client.CollectionExistsAsync(collectionName))
	await client.DeleteCollectionAsync(collectionName);

await client.CreateCollectionAsync(
	collectionName: collectionName,
	vectorsConfig: new VectorParamsMap
	{
		Map =
		{
			["dense"] = new VectorParams
			{
				Size = 384,
				Distance = Distance.Cosine,
			},
			["multi"] = new VectorParams
			{
				Size = 96,
				Distance = Distance.Cosine,
				MultivectorConfig = new() { Comparator = MultiVectorComparator.MaxSim },
				HnswConfig = new HnswConfigDiff { M = 0 }, // Disable HNSW for reranking
			},
		}
	},
	sparseVectorsConfig: new SparseVectorConfig
	{
		Map =
		{
			["sparse"] = new SparseVectorParams { Modifier = Modifier.Idf }
		}
	}
);
collectionName := "hybrid-search"

exists, err := client.CollectionExists(context.Background(), collectionName)
if exists {
	client.DeleteCollection(context.Background(), collectionName)
}

client.CreateCollection(context.Background(), &qdrant.CreateCollection{
	CollectionName: collectionName,
	VectorsConfig: qdrant.NewVectorsConfigMap(
		map[string]*qdrant.VectorParams{
			"dense": {
				Size:     384,
				Distance: qdrant.Distance_Cosine,
			},
			"multi": {
				Size:     96,
				Distance: qdrant.Distance_Cosine,
				MultivectorConfig: &qdrant.MultiVectorConfig{
					Comparator: qdrant.MultiVectorComparator_MaxSim,
				},
				HnswConfig: &qdrant.HnswConfigDiff{M: qdrant.PtrOf(uint64(0))}, // Disable HNSW for reranking
			},
		},
	),
	SparseVectorsConfig: qdrant.NewSparseVectorsConfig(
		map[string]*qdrant.SparseVectorParams{
			"sparse": {Modifier: qdrant.Modifier_Idf.Enum()},
		},
	),
})

Ingest Data

Now you can load the sci-fi book descriptions from a CSV and insert them into the hybrid-search collection. With Cloud Inference, embeddings are computed server-side by wrapping the text in a Document object.

pythontypescriptrustjavacsharpgo
from qdrant_client.models import Document, PointStruct

csv_url = 'https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv'

points = (
    PointStruct(
        id=idx,
        vector={
            "dense": Document(text=row['Description'], model=dense_embedding_model),
            "sparse": Document(text=row['Description'], model=sparse_embedding_model),
            "multi": Document(text=row['Description'], model=late_interaction_embedding_model),
        },
        payload={"title": row['Title'], "author": row['Author'], "description": row['Description']}
    )
    for idx, row in enumerate(parse_csv(csv_url))
)
client.upload_points(
    collection_name=collection_name,
    points=points,
    batch_size=25
)
const csvUrl = "https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv";

const batchSize = 25;
let idx = 0;
let buffer: Schemas["PointStruct"][] = [];

for await (const { title, author, description } of parseCSV(csvUrl)) {
    buffer.push({
        id: idx++,
        vector: {
            dense: { text: description, model: denseEmbeddingModel },
            sparse: { text: description, model: sparseEmbeddingModel },
            multi: { text: description, model: lateInteractionEmbeddingModel },
        },
        payload: { title, author, description },
    });

    if (buffer.length >= batchSize) {
        await client.upsert(collectionName, { points: buffer });
        buffer = [];
    }
}

if (buffer.length > 0) {
    await client.upsert(collectionName, { points: buffer });
}
let csv_url = "https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv";

let batch_size = 25;
let mut idx: u64 = 0;
let mut buffer: Vec<PointStruct> = Vec::new();

for row in parse_csv(csv_url)? {
    let row = row?;
    let title = row.title;
    let author = row.author;
    let description = row.description;

    let vectors = NamedVectors::default()
        .add_vector("dense", Document::new(&description, dense_embedding_model))
        .add_vector("sparse", Document::new(&description, sparse_embedding_model))
        .add_vector("multi", Document::new(&description, late_interaction_embedding_model));

    buffer.push(PointStruct::new(
        idx,
        vectors,
        [
            ("title", title.into()),
            ("author", author.into()),
            ("description", description.into()),
        ],
    ));
    idx += 1;

    if buffer.len() >= batch_size {
        client
            .upsert_points(UpsertPointsBuilder::new(
                collection_name,
                std::mem::take(&mut buffer),
            ))
            .await?;
    }
}

if !buffer.is_empty() {
    client
        .upsert_points(UpsertPointsBuilder::new(collection_name, buffer))
        .await?;
}
String csvUrl = "https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv";

int batchSize = 25;
long idx = 0;
List<PointStruct> buffer = new ArrayList<>();

try (var stream = parseCSV(csvUrl)) {
    for (var row : (Iterable<CsvRow>) stream::iterator) {
        String title = row.title;
        String author = row.author;
        String description = row.description;

        buffer.add(
            PointStruct.newBuilder()
                .setId(io.qdrant.client.PointIdFactory.id(idx++))
                .setVectors(
                    namedVectors(
                        Map.of(
                            "dense",
                            vector(
                                Document.newBuilder()
                                    .setText(description)
                                    .setModel(denseEmbeddingModel)
                                    .build()),
                            "sparse",
                            vector(
                                Document.newBuilder()
                                    .setText(description)
                                    .setModel(sparseEmbeddingModel)
                                    .build()),
                            "multi",
                            vector(
                                Document.newBuilder()
                                    .setText(description)
                                    .setModel(lateInteractionEmbeddingModel)
                                    .build()))))
                .putAllPayload(
                    Map.of(
                        "title", value(title),
                        "author", value(author),
                        "description", value(description)))
                .build());

        if (buffer.size() >= batchSize) {
            client.upsertAsync(collectionName, buffer).get();
            buffer.clear();
        }
    }
}

if (!buffer.isEmpty()) {
    client.upsertAsync(collectionName, buffer).get();
}
string csvUrl = "https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv";

int batchSize = 25;
ulong idx = 0;
var buffer = new List<PointStruct>();

await foreach (var (title, author, description) in ParseCsv(csvUrl))
{
	buffer.Add(new PointStruct
	{
		Id = idx++,
		Vectors = new Dictionary<string, Vector>
		{
			["dense"] = new Document { Text = description, Model = denseEmbeddingModel },
			["sparse"] = new Document { Text = description, Model = sparseEmbeddingModel },
			["multi"] = new Document { Text = description, Model = lateInteractionEmbeddingModel },
		},
		Payload = { ["title"] = title, ["author"] = author, ["description"] = description }
	});

	if (buffer.Count >= batchSize)
	{
		await client.UpsertAsync(collectionName: collectionName, points: buffer);
		buffer.Clear();
	}
}

if (buffer.Count > 0)
	await client.UpsertAsync(collectionName: collectionName, points: buffer);
csvUrl := "https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv"

batchSize := 25
var idx uint64
var buffer []*qdrant.PointStruct

err = parseCSV(csvUrl, func(row CSVRow) {
	title := row.Title
	author := row.Author
	description := row.Description

	buffer = append(buffer, &qdrant.PointStruct{
		Id: qdrant.NewIDNum(idx),
		Vectors: qdrant.NewVectorsMap(map[string]*qdrant.Vector{
			"dense":  qdrant.NewVectorDocument(&qdrant.Document{Text: description, Model: denseEmbeddingModel}),
			"sparse": qdrant.NewVectorDocument(&qdrant.Document{Text: description, Model: sparseEmbeddingModel}),
			"multi":  qdrant.NewVectorDocument(&qdrant.Document{Text: description, Model: lateInteractionEmbeddingModel}),
		}),
		Payload: qdrant.NewValueMap(map[string]any{
			"title":       title,
			"author":      author,
			"description": description,
		}),
	})
	idx++

	if len(buffer) >= batchSize {
		client.Upsert(context.Background(), &qdrant.UpsertPoints{
			CollectionName: collectionName,
			Points:         buffer,
		})
		buffer = nil
	}
})

if len(buffer) > 0 {
	client.Upsert(context.Background(), &qdrant.UpsertPoints{
		CollectionName: collectionName,
		Points:         buffer,
	})
}

This code creates a point for each book, with three vector types and a payload containing the title, author, and description. Documents are uploaded to Qdrant in batches of 25, with Cloud Inference generating all three embeddings on the fly. In Production, the optimal batch size depends on your data and cluster, so you may want to experiment with different sizes for best performance.

This code uses a helper function to stream and parse the CSV file:

Details
pythontypescriptrustjavacsharpgo
import csv
import urllib.request

def parse_csv(url):
    with urllib.request.urlopen(url) as response:
        reader = csv.DictReader(line.decode('utf-8') for line in response)
        yield from reader
function parseCsvLine(line: string): string[] {
    const fields: string[] = [];
    let i = 0;
    while (i < line.length) {
        if (line[i] === '"') {
            i++;
            let field = "";
            while (i < line.length) {
                if (line[i] === '"' && line[i + 1] === '"') { field += '"'; i += 2; }
                else if (line[i] === '"') { i++; break; }
                else { field += line[i++]; }
            }
            fields.push(field);
            if (line[i] === ",") i++;
        } else {
            const start = i;
            while (i < line.length && line[i] !== ",") i++;
            fields.push(line.slice(start, i));
            if (i < line.length) i++;
        }
    }
    return fields;
}

async function* parseCSV(url: string): AsyncGenerator<{ text: string; datetime: string }> {
    const response = await fetch(url);
    const reader = response.body!.getReader();
    const decoder = new TextDecoder();
    let remainder = "";
    let headers: string[] | null = null;
    let textIdx = -1;
    let datetimeIdx = -1;

    while (true) {
        const { done, value } = await reader.read();
        const chunk = done ? "" : decoder.decode(value, { stream: true });
        const lines = (remainder + chunk).split("\n");
        remainder = done ? "" : lines.pop()!;

        for (const line of lines) {
            if (!line.trim()) continue;
            if (headers === null) {
                headers = line.split(",");
                textIdx = headers.indexOf("text");
                datetimeIdx = headers.indexOf("datetime");
                continue;
            }
            const fields = parseCsvLine(line);
            yield { text: fields[textIdx], datetime: fields[datetimeIdx] };
        }

        if (done) break;
    }
}
struct CsvRow {
    text: String,
    datetime: String,
}

fn parse_csv(url: &str) -> anyhow::Result<impl Iterator<Item = anyhow::Result<CsvRow>>> {
    let reader = ureq::get(url).call()?.into_body().into_reader();
    let mut rdr = csv::Reader::from_reader(reader);
    let headers = rdr.headers()?.clone();
    let text_idx = headers.iter().position(|h| h == "text").unwrap();
    let datetime_idx = headers.iter().position(|h| h == "datetime").unwrap();
    let iter = rdr.into_records().map(move |result| {
        let record = result?;
        Ok(CsvRow {
            text: record[text_idx].to_string(),
            datetime: record[datetime_idx].to_string(),
        })
    });
    Ok(iter)
}
static class CsvRow {
    final String text;
    final String datetime;
    CsvRow(String text, String datetime) { this.text = text; this.datetime = datetime; }
}

static Stream<CsvRow> parseCSV(String url) throws Exception {
    Function<String, List<String>> parseCsvLine = line -> {
        List<String> fields = new ArrayList<>();
        boolean inQuotes = false;
        var sb = new StringBuilder();
        for (char c : line.toCharArray()) {
            if (c == '"') {
                inQuotes = !inQuotes;
            } else if (c == ',' && !inQuotes) {
                fields.add(sb.toString());
                sb.setLength(0);
            } else {
                sb.append(c);
            }
        }
        fields.add(sb.toString());
        return fields;
    };

    var reader = new BufferedReader(new InputStreamReader(new URL(url).openStream()));
    String headerLine = reader.readLine();
    List<String> headers = List.of(headerLine.split(","));
    int textIdx = headers.indexOf("text");
    int datetimeIdx = headers.indexOf("datetime");

    return reader.lines()
        .map(line -> {
            List<String> fields = parseCsvLine.apply(line);
            return new CsvRow(fields.get(textIdx), fields.get(datetimeIdx));
        })
        .onClose(() -> { try { reader.close(); } catch (Exception ignored) {} });
}
async IAsyncEnumerable<(string text, string datetime)> ParseCsv(string url)
{
	using var httpClient = new HttpClient();
	using var stream = await httpClient.GetStreamAsync(url);
	using var parser = new TextFieldParser(new StreamReader(stream));
	parser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
	parser.SetDelimiters(",");
	string[]? headers = parser.ReadFields();
	int textIdx = Array.IndexOf(headers!, "text");
	int datetimeIdx = Array.IndexOf(headers!, "datetime");
	while (!parser.EndOfData)
	{
		var fields = parser.ReadFields()!;
		yield return (fields[textIdx], fields[datetimeIdx]);
	}
}
type CSVRow struct {
	Text     string
	Datetime string
}

func parseCSV(url string, fn func(CSVRow)) error {
	resp, err := http.Get(url)
	if err != nil {
		return err
	}
	defer resp.Body.Close()

	csvReader := csv.NewReader(resp.Body)
	headers, err := csvReader.Read()
	if err != nil {
		return err
	}

	textIdx, datetimeIdx := -1, -1
	for i, h := range headers {
		switch h {
		case "text":
			textIdx = i
		case "datetime":
			datetimeIdx = i
		}
	}

	for {
		row, err := csvReader.Read()
		if err == io.EOF {
			break
		}
		if err != nil {
			return err
		}
		fn(CSVRow{Text: row[textIdx], Datetime: row[datetimeIdx]})
	}
	return nil
}

Retrieval

Before combining results, let’s see how dense and sparse retrieval perform individually.

For retrieval, wrap the query in a Document object so Cloud Inference computes the appropriate embeddings server-side.

Dense retrieval captures semantic meaning:

pythontypescriptrustjavacsharpgo
import pprint

query = "time travel"

results = client.query_points(
    collection_name,
    query=models.Document(text=query, model=dense_embedding_model),
    using="dense",
    limit=10,
)

pprint.pp(results.points)
const query = "time travel";

const denseResults = await client.query(collectionName, {
    query: { text: query, model: denseEmbeddingModel },
    using: "dense",
    limit: 10,
});

console.log(denseResults.points);
let query = "time travel";

let results = client
    .query(
        QueryPointsBuilder::new(collection_name)
            .query(Query::new_nearest(Document::new(query, dense_embedding_model)))
            .using("dense")
            .limit(10),
    )
    .await?;

for result in results.result {
    println!("{:?}", result);
}
String query = "time travel";

var results = client.queryAsync(
    QueryPoints.newBuilder()
        .setCollectionName(collectionName)
        .setQuery(
            nearest(
                Document.newBuilder()
                    .setText(query)
                    .setModel(denseEmbeddingModel)
                    .build()))
        .setUsing("dense")
        .setLimit(10)
        .build()
).get();

for (var result : results) {
    System.out.println(result);
}
string query = "time travel";

var results = await client.QueryAsync(
	collectionName: collectionName,
	query: new Document { Text = query, Model = denseEmbeddingModel },
	usingVector: "dense",
	limit: 10
);

foreach (var result in results)
	Console.WriteLine(result);
query := "time travel"

results, err := client.Query(context.Background(), &qdrant.QueryPoints{
	CollectionName: collectionName,
	Query: qdrant.NewQueryDocument(&qdrant.Document{
		Text:  query,
		Model: denseEmbeddingModel,
	}),
	Using: qdrant.PtrOf("dense"),
	Limit: qdrant.PtrOf(uint64(10)),
})

for _, result := range results {
	fmt.Println(result)
}

Let’s take a look at the top 5 results:

PositionTitleDescription
1The Time MachineA Victorian scientist travels far into the future to witness civilization’s fate.
2Slaughterhouse-FiveA nonlinear, time-tripping reflection on war and fate.
3The PeripheralTwo timelines intersect through telepresence technology.
4The Space Between WorldsA multiverse traveler uncovers dangerous secrets across parallel Earths.
5The Forever WarA soldier experiences extreme time dilation while fighting an interstellar war.

Each of these books has a strong semantic connection to the concept of time travel, even if the exact phrase doesn’t appear in the description.

Sparse retrieval focuses on keyword matches:

pythontypescriptrustjavacsharpgo
results = client.query_points(
    collection_name,
    query=models.Document(text=query, model=sparse_embedding_model),
    using="sparse",
    limit=10,
)

pprint.pp(results.points)
const sparseResults = await client.query(collectionName, {
    query: { text: query, model: sparseEmbeddingModel },
    using: "sparse",
    limit: 10,
});

console.log(sparseResults.points);
let results = client
    .query(
        QueryPointsBuilder::new(collection_name)
            .query(Query::new_nearest(Document::new(query, sparse_embedding_model)))
            .using("sparse")
            .limit(10),
    )
    .await?;

for result in results.result {
    println!("{:?}", result);
}
results = client.queryAsync(
    QueryPoints.newBuilder()
        .setCollectionName(collectionName)
        .setQuery(
            nearest(
                Document.newBuilder()
                    .setText(query)
                    .setModel(sparseEmbeddingModel)
                    .build()))
        .setUsing("sparse")
        .setLimit(10)
        .build()
).get();

for (var result : results) {
    System.out.println(result);
}
results = await client.QueryAsync(
	collectionName: collectionName,
	query: new Document { Text = query, Model = sparseEmbeddingModel },
	usingVector: "sparse",
	limit: 10
);

foreach (var result in results)
	Console.WriteLine(result);
results, err = client.Query(context.Background(), &qdrant.QueryPoints{
	CollectionName: collectionName,
	Query: qdrant.NewQueryDocument(&qdrant.Document{
		Text:  query,
		Model: sparseEmbeddingModel,
	}),
	Using: qdrant.PtrOf("sparse"),
	Limit: qdrant.PtrOf(uint64(10)),
})

for _, result := range results {
	fmt.Println(result)
}

The top 5 results are:

PositionTitleDescription
1Station ElevenA traveling symphony roams a post-pandemic North America.
2HyperionTravelers share haunting tales on a pilgrimage to confront the mysterious Shrike.
3The Space Between WorldsA multiverse traveler uncovers dangerous secrets across parallel Earths.
4The Time MachineA Victorian scientist travels far into the future to witness civilization’s fate.
5Slaughterhouse-FiveA nonlinear, time-tripping reflection on war and fate.

The sparse BM25 model performs keyword matching with stemming. As a result, it returns books whose descriptions contain variants of the words “time” and “travel”. For instance, “Station Eleven” and “Hyperion” mention “traveling” and “travelers” but aren’t primarily about time travel.

Hybrid search can be used to prefetch the dense and sparse results and next merge them using Reciprocal Rank Fusion (RRF):

pythontypescriptrustjavacsharpgo
prefetch = [
    models.Prefetch(
        query=models.Document(text=query, model=dense_embedding_model),
        using="dense",
        limit=20,
    ),
    models.Prefetch(
        query=models.Document(text=query, model=sparse_embedding_model),
        using="sparse",
        limit=20,
    ),
]

results = client.query_points(
    collection_name,
    prefetch=prefetch,
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    with_payload=True,
    limit=10,
)

pprint.pp(results.points)
const hybridResults = await client.query(collectionName, {
    prefetch: [
        {
            query: { text: query, model: denseEmbeddingModel },
            using: "dense",
            limit: 20,
        },
        {
            query: { text: query, model: sparseEmbeddingModel },
            using: "sparse",
            limit: 20,
        },
    ],
    query: { fusion: "rrf" },
    with_payload: true,
    limit: 10,
});

console.log(hybridResults.points);
let results = client
    .query(
        QueryPointsBuilder::new(collection_name)
            .add_prefetch(
                PrefetchQueryBuilder::default()
                    .query(Query::new_nearest(Document::new(query, dense_embedding_model)))
                    .using("dense")
                    .limit(20u64),
            )
            .add_prefetch(
                PrefetchQueryBuilder::default()
                    .query(Query::new_nearest(Document::new(query, sparse_embedding_model)))
                    .using("sparse")
                    .limit(20u64),
            )
            .query(Query::new_fusion(Fusion::Rrf))
            .with_payload(true)
            .limit(10),
    )
    .await?;

for result in results.result {
    println!("{:?}", result);
}
results = client.queryAsync(
    QueryPoints.newBuilder()
        .setCollectionName(collectionName)
        .addPrefetch(
            PrefetchQuery.newBuilder()
                .setQuery(
                    nearest(
                        Document.newBuilder()
                            .setText(query)
                            .setModel(denseEmbeddingModel)
                            .build()))
                .setUsing("dense")
                .setLimit(20)
                .build())
        .addPrefetch(
            PrefetchQuery.newBuilder()
                .setQuery(
                    nearest(
                        Document.newBuilder()
                            .setText(query)
                            .setModel(sparseEmbeddingModel)
                            .build()))
                .setUsing("sparse")
                .setLimit(20)
                .build())
        .setQuery(Query.newBuilder().setFusion(Fusion.RRF).build())
        .setWithPayload(enable(true))
        .setLimit(10)
        .build()
).get();

for (var result : results) {
    System.out.println(result);
}
results = await client.QueryAsync(
	collectionName: collectionName,
	prefetch: new List<PrefetchQuery>
	{
		new()
		{
			Query = new Document { Text = query, Model = denseEmbeddingModel },
			Using = "dense",
			Limit = 20,
		},
		new()
		{
			Query = new Document { Text = query, Model = sparseEmbeddingModel },
			Using = "sparse",
			Limit = 20,
		},
	},
	query: Fusion.Rrf,
	payloadSelector: true,
	limit: 10
);

foreach (var result in results)
	Console.WriteLine(result);
results, err = client.Query(context.Background(), &qdrant.QueryPoints{
	CollectionName: collectionName,
	Prefetch: []*qdrant.PrefetchQuery{
		{
			Query: qdrant.NewQueryDocument(&qdrant.Document{
				Text:  query,
				Model: denseEmbeddingModel,
			}),
			Using: qdrant.PtrOf("dense"),
			Limit: qdrant.PtrOf(uint64(20)),
		},
		{
			Query: qdrant.NewQueryDocument(&qdrant.Document{
				Text:  query,
				Model: sparseEmbeddingModel,
			}),
			Using: qdrant.PtrOf("sparse"),
			Limit: qdrant.PtrOf(uint64(20)),
		},
	},
	Query:       qdrant.NewQueryFusion(qdrant.Fusion_RRF),
	WithPayload: qdrant.NewWithPayload(true),
	Limit:       qdrant.PtrOf(uint64(10)),
})

for _, result := range results {
	fmt.Println(result)
}

This runs two sub-queries in parallel: one using dense embeddings for semantic meaning, the other using sparse BM25 embeddings for keyword matching. The prefetch step retrieves the top 20 candidates from each sub-query (dense and sparse) and fuses the ranked lists into a single result using RRF.

The results are a mix of books that are semantically relevant to time travel and those that contain the keywords, giving you a broader set of relevant documents. However, the ranking may not be optimal since, by default, RRF treats both signals equally and doesn’t capture the nuanced interactions between query and document terms. For example, “Station Eleven” ranks highly because it has stronger keyword matches, even though it is not about time travel.

PositionTitleDescription
1The Time MachineA Victorian scientist travels far into the future to witness civilization’s fate.
2Station ElevenA traveling symphony roams a post-pandemic North America.
3Slaughterhouse-FiveA nonlinear, time-tripping reflection on war and fate.
4The Space Between WorldsA multiverse traveler uncovers dangerous secrets across parallel Earths.
5HyperionTravelers share haunting tales on a pilgrimage to confront the mysterious Shrike.

Rerank

The hybrid search results can be reranked using late interaction embeddings for maximum precision. Instead of fusing with RRF, use the ColBERT multi-vector as the final ranking signal:

pythontypescriptrustjavacsharpgo
prefetch = [
    models.Prefetch(
        query=models.Document(text=query, model=dense_embedding_model),
        using="dense",
        limit=20,
    ),
    models.Prefetch(
        query=models.Document(text=query, model=sparse_embedding_model),
        using="sparse",
        limit=20,
    ),
]

results = client.query_points(
    collection_name,
    prefetch=prefetch,
    query=models.Document(text=query, model=late_interaction_embedding_model),
    using="multi",
    with_payload=True,
    limit=10,
)

pprint.pp(results.points)
const rerankedResults = await client.query(collectionName, {
    prefetch: [
        {
            query: { text: query, model: denseEmbeddingModel },
            using: "dense",
            limit: 20,
        },
        {
            query: { text: query, model: sparseEmbeddingModel },
            using: "sparse",
            limit: 20,
        },
    ],
    query: { text: query, model: lateInteractionEmbeddingModel },
    using: "multi",
    with_payload: true,
    limit: 10,
});

console.log(rerankedResults.points);
let results = client
    .query(
        QueryPointsBuilder::new(collection_name)
            .add_prefetch(
                PrefetchQueryBuilder::default()
                    .query(Query::new_nearest(Document::new(query, dense_embedding_model)))
                    .using("dense")
                    .limit(20u64),
            )
            .add_prefetch(
                PrefetchQueryBuilder::default()
                    .query(Query::new_nearest(Document::new(query, sparse_embedding_model)))
                    .using("sparse")
                    .limit(20u64),
            )
            .query(Query::new_nearest(Document::new(query, late_interaction_embedding_model)))
            .using("multi")
            .with_payload(true)
            .limit(10),
    )
    .await?;

for result in results.result {
    println!("{:?}", result);
}
results = client.queryAsync(
    QueryPoints.newBuilder()
        .setCollectionName(collectionName)
        .addPrefetch(
            PrefetchQuery.newBuilder()
                .setQuery(
                    nearest(
                        Document.newBuilder()
                            .setText(query)
                            .setModel(denseEmbeddingModel)
                            .build()))
                .setUsing("dense")
                .setLimit(20)
                .build())
        .addPrefetch(
            PrefetchQuery.newBuilder()
                .setQuery(
                    nearest(
                        Document.newBuilder()
                            .setText(query)
                            .setModel(sparseEmbeddingModel)
                            .build()))
                .setUsing("sparse")
                .setLimit(20)
                .build())
        .setQuery(
            nearest(
                Document.newBuilder()
                    .setText(query)
                    .setModel(lateInteractionEmbeddingModel)
                    .build()))
        .setUsing("multi")
        .setWithPayload(enable(true))
        .setLimit(10)
        .build()
).get();

for (var result : results) {
    System.out.println(result);
}
results = await client.QueryAsync(
	collectionName: collectionName,
	prefetch: new List<PrefetchQuery>
	{
		new()
		{
			Query = new Document { Text = query, Model = denseEmbeddingModel },
			Using = "dense",
			Limit = 20,
		},
		new()
		{
			Query = new Document { Text = query, Model = sparseEmbeddingModel },
			Using = "sparse",
			Limit = 20,
		},
	},
	query: new Document { Text = query, Model = lateInteractionEmbeddingModel },
	usingVector: "multi",
	payloadSelector: true,
	limit: 10
);

foreach (var result in results)
	Console.WriteLine(result);
results, err = client.Query(context.Background(), &qdrant.QueryPoints{
	CollectionName: collectionName,
	Prefetch: []*qdrant.PrefetchQuery{
		{
			Query: qdrant.NewQueryDocument(&qdrant.Document{
				Text:  query,
				Model: denseEmbeddingModel,
			}),
			Using: qdrant.PtrOf("dense"),
			Limit: qdrant.PtrOf(uint64(20)),
		},
		{
			Query: qdrant.NewQueryDocument(&qdrant.Document{
				Text:  query,
				Model: sparseEmbeddingModel,
			}),
			Using: qdrant.PtrOf("sparse"),
			Limit: qdrant.PtrOf(uint64(20)),
		},
	},
	Query: qdrant.NewQueryDocument(&qdrant.Document{
		Text:  query,
		Model: lateInteractionEmbeddingModel,
	}),
	Using:       qdrant.PtrOf("multi"),
	WithPayload: qdrant.NewWithPayload(true),
	Limit:       qdrant.PtrOf(uint64(10)),
})

for _, result := range results {
	fmt.Println(result)
}

The prefetch step retrieves the top 20 candidates from each sub-query (dense and sparse), and the ColBERT late interaction model reranks the combined candidates to surface the most relevant results.

Compare results

Let’s compare the top 10 results of hybrid search with and without reranking. Notice how some documents shift in rank based on their relevance according to the late interaction embeddings.

TitleDescriptionRerankedRRF rankRank Change
Slaughterhouse-FiveA nonlinear, time-tripping reflection on war and fate.13Moved up
The Forever WarA soldier experiences extreme time dilation while fighting an interstellar war.28Moved up
KindredA modern Black woman is pulled back in time to the antebellum South.37Moved up
SpinEarth is enclosed in a time-distorting barrier by unknown forces.46Moved up
The Light BrigadeSoldiers are turned into light to fight a war across space-time.510Moved up

Best Practices in Reranking

Reranking with late interaction models can dramatically improve the relevance of search results, especially when combined with hybrid search. Here are some best practices to keep in mind:

  • Continuous testing and monitoring: regularly evaluate your hybrid search pipelines to avoid overfitting and make timely adjustments to maintain performance.
  • Balance relevance and cost: Reranking can be computationally expensive, and late interaction embeddings require significant storage. Aim for a balance between relevance and cost. Simple fusion methods like RRF can be effective for many use cases, while late interaction models can be reserved for queries where precision is critical.

Conclusion

Reranking with late interaction models is a powerful tool that boosts the relevance of search results, especially when combined with hybrid search methods. While it can add some latency due to its complexity, applying it to a smaller, pre-filtered subset of results ensures both speed and relevance.

Was this page useful?

Thank you for your feedback! 🙏

We are sorry to hear that. 😔 You can edit this page on GitHub, or create a GitHub issue.