# Multi-Representation Search
# Multi-Representation Search Across Titles, Abstracts, and Chunks

| Time: 45 min | Level: Intermediate | Output: [GitHub](https://github.com/qdrant/examples/blob/master/multi-representation-search/multi-representation-search.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/qdrant/examples/blob/master/multi-representation-search/multi-representation-search.ipynb) |
| --- | ----------- | ----------- | ----------- |

A document is rarely well-represented by a single embedding. A paper has a title, an abstract, body chunks, and category tags, each carrying a different signal. Treat all four as one dense vector and the title gets averaged out; chunk-level grounding for downstream reasoning disappears.

This tutorial builds retrieval that uses each representation deliberately: named vectors per representation, fused via the Query API and grouped back to the document level for presentation.

This tutorial assumes you've built hybrid (dense plus sparse) search before, that you're comfortable with [named vectors](/documentation/manage-data/vectors/index.md#named-vectors), the [Query API](/documentation/search/hybrid-queries/index.md), Reciprocal Rank Fusion (RRF), and Best Matching 25 (BM25). If hybrid search is new, start with the [hybrid search section of the Text Search guide](/documentation/search/text-search/index.md#combining-semantic-and-lexical-search-with-hybrid-search) first.

If you'd rather read the code, the [accompanying notebook](https://githubtocolab.com/qdrant/examples/blob/master/multi-representation-search/multi-representation-search.ipynb) walks through this pipeline step by step with eval numbers at each stage.

## Setup

Install the Python packages used throughout the tutorial:

```bash
pip install qdrant-client datasets
```

<aside role="status">
This tutorial uses <a href="/documentation/inference/#qdrant-cloud-inference">Qdrant Cloud Inference</a> to generate embeddings server-side. The free tier covers this tutorial's footprint. Core BM25 runs on any Qdrant instance, but dense Cloud Inference is Cloud-only. To self-host, generate dense vectors on the client with a library like <a href="/documentation/fastembed/">FastEmbed</a> and pass them as raw vectors instead of <code>models.Document</code>.
</aside>

## Dataset

You'll work with 20 000 ML/CS arXiv papers (2018 and later) from the [`gfissore/arxiv-abstracts-2021`](https://huggingface.co/datasets/gfissore/arxiv-abstracts-2021) dataset. Each paper has a title, an abstract, and category tags, giving you three text representations once the abstract is chunked, plus categories as filterable metadata.

arXiv abstracts fit any dense model's context window, so chunking isn't strictly required for this dataset. We chunk anyway because the pipeline shape (chunk-level retrieval, document-level grouping) is what you'd use on full paper bodies in production. A fixed-length sentence chunker keeps the example simple; the right strategy depends on your document structure.

```python
from datasets import load_dataset

ML_CATEGORIES = {"cs.LG", "cs.CV", "cs.CL", "cs.AI", "stat.ML"}

# Non-streaming so HF caches the parquet locally; first run downloads ~2.5 GB, re-runs are instant.
dataset = load_dataset("gfissore/arxiv-abstracts-2021", split="train")

papers = []
# IDs are roughly chronological; iterate from the end to land on 2021/2020/2019 papers first.
for i in range(len(dataset) - 1, -1, -1):
    if len(papers) >= 20000:
        break
    row = dataset[i]
    if not row["abstract"] or not row["title"]:
        continue
    # categories arrive as space-joined strings (e.g. ["cs.LG cs.CV"]); split each entry.
    cats = [tok for entry in row["categories"] for tok in entry.split()]
    if not any(c in ML_CATEGORIES for c in cats):
        continue
    # Year lives in the YYMM prefix of new-format arXiv IDs ("2104.01234" -> 2021).
    arxiv_id = row["id"]
    if "/" in arxiv_id or "." not in arxiv_id:
        continue  # skip pre-2007 IDs like "math/0506001"
    if 2000 + int(arxiv_id[:2]) < 2018:
        continue
    papers.append({
        "arxiv_id": arxiv_id,
        "title": row["title"].strip(),
        "abstract": row["abstract"].strip(),
        "tags": cats,
    })
```

## Collection Schema

Design the collection before writing any queries. The point granularity is the chunk: every chunk of every paper becomes one point. Title and abstract embeddings are stored on every chunk so the Query API can fuse across them in a single request without an extra lookup.

```python
from qdrant_client import QdrantClient, models

# Replace url and api_key with your own from https://cloud.qdrant.io
client = QdrantClient(
    url="https://xyz-example.qdrant.io:6333",
    api_key="<your-api-key>",
    cloud_inference=True,
)

# 384 is the output dimension of sentence-transformers/all-minilm-l6-v2, used below for every dense vector.
client.create_collection(
    collection_name="arxiv_multi_repr",
    vectors_config={
        "dense_chunk":   models.VectorParams(size=384, distance=models.Distance.COSINE),
        "dense_title":   models.VectorParams(size=384, distance=models.Distance.COSINE),
        "dense_abstract": models.VectorParams(size=384, distance=models.Distance.COSINE),
    },
    sparse_vectors_config={
        "sparse_title": models.SparseVectorParams(
            modifier=models.Modifier.IDF,
        ),
    },
)

# Index 'document_id' so the Query API can group by it; index 'tags' so we can filter on category.
client.create_payload_index(
    collection_name="arxiv_multi_repr",
    field_name="document_id",
    field_schema=models.PayloadSchemaType.KEYWORD,
)
client.create_payload_index(
    collection_name="arxiv_multi_repr",
    field_name="tags",
    field_schema=models.PayloadSchemaType.KEYWORD,
)
```

Each vector covers a different signal: `dense_chunk` for chunk content, `dense_title` and `sparse_title` for the title (dense and lexical respectively), `dense_abstract` for the abstract as a whole.

Categories live in the `tags` payload with a [keyword index](/documentation/manage-data/indexing/index.md#payload-index), so queries can pre-filter by category.

Title and abstract vectors are duplicated across every chunk of the same document. That trades storage for query simplicity: one collection, one Query API call, every representation reachable from any point. For 20 000 documents at 24 chunks each, the duplicated vectors add ~1.4 GB; storing them once per document in a sidecar collection would be ~60 MB. See [Calculating RAM size](/documentation/capacity-planning/index.md#calculating-ram-size) for how to price this on your own corpus, and [Lookup in groups](/documentation/search/search/index.md#lookup-in-groups) to split heavy fields out and rejoin at grouping time.

Ingestion produces one point per chunk and reuses the title and abstract embeddings.

```python
DENSE_MODEL = "sentence-transformers/all-minilm-l6-v2"
BM25_MODEL = "qdrant/bm25"

def chunk_sentences(text, target_len=2):
    """Split text into ~2-sentence chunks; fall back to the full text if it doesn't split cleanly."""
    sentences = [s.strip() for s in text.split(". ") if s.strip()]
    return [". ".join(sentences[i:i + target_len])
            for i in range(0, len(sentences), target_len)] or [text]

points = []
for paper in papers:
    chunks = chunk_sentences(paper["abstract"])

    # Title, abstract, and sparse docs are reused across every chunk of this paper; only the chunk text varies.
    # Cloud Inference embeds each Document on the server, so you don't need a client-side embedding library.
    title_doc   = models.Document(text=paper["title"],    model=DENSE_MODEL)
    abstract_doc = models.Document(text=paper["abstract"], model=DENSE_MODEL)
    # avg_len is the average word count of the indexed text.
    # Default is 256 (document-length); setting it to the actual field length (~10 here) improves BM25 scoring accuracy.
    sparse_doc  = models.Document(
        text=paper["title"],
        model=BM25_MODEL,
        options={"avg_len": 10.0},
    )

    for i, chunk in enumerate(chunks):
        points.append(models.PointStruct(
            id=len(points),
            vector={
                "dense_chunk":     models.Document(text=chunk, model=DENSE_MODEL),
                "dense_title":     title_doc,
                "dense_abstract":   abstract_doc,
                "sparse_title": sparse_doc,
            },
            payload={
                "document_id": paper["arxiv_id"],
                "title":       paper["title"],
                "tags":        paper["tags"],
                "chunk_index": i,
                "chunk_text":  chunk,
            },
        ))

client.upload_points(collection_name="arxiv_multi_repr", points=points, batch_size=256, parallel=2)
```

After the upload completes, opening any point in the Qdrant Cloud UI shows all four named vectors attached to one chunk. `dense_chunk` carries the chunk's own embedding, while `dense_title`, `dense_abstract`, and `sparse_title` are the same across every chunk of this paper.

![A point in the arxiv_multi_repr collection showing all four named vectors](/documentation/tutorials/multi-representation-search/point.png)

## Retrieval

The recommended pipeline fuses four prefetches with Reciprocal Rank Fusion and groups the results by document. One Query API call covers retrieval, fusion, and grouping:

```python
def retrieve(query, limit=10, group_size=3, tags=None):
    dense_query  = models.Document(text=query, model=DENSE_MODEL)
    sparse_query = models.Document(text=query, model=BM25_MODEL)
    # Optional category filter. When tags is provided, Qdrant pre-filters candidates
    # to points whose 'tags' payload includes any of the given values.
    query_filter = (
        models.Filter(must=[models.FieldCondition(key="tags", match=models.MatchAny(any=tags))])
        if tags else None
    )
    # query_points_groups runs the prefetches, fuses with RRF, applies the filter, and groups results by document_id.
    # You may need to adjust the per-prefetch limit based on the number of chunks per document; grouping only sees what the prefetch returns.
    return client.query_points_groups(
        collection_name="arxiv_multi_repr",
        prefetch=[
            models.Prefetch(query=dense_query,  using="dense_chunk",    limit=100),
            models.Prefetch(query=dense_query,  using="dense_title",    limit=100),
            models.Prefetch(query=dense_query,  using="dense_abstract", limit=100),
            models.Prefetch(query=sparse_query, using="sparse_title",   limit=100),
        ],
        query=models.FusionQuery(fusion=models.Fusion.RRF),
        query_filter=query_filter,
        group_by="document_id",
        group_size=group_size,
        limit=limit,
    ).groups
```

### What to Prefetch

A representation only earns its own prefetch if it carries signal independent of the others. The four above are deliberately complementary:

- `dense_chunk` carries the body content.
- `dense_title` carries the topical naming. The title isn't part of the chunk or abstract text in this schema, so a query matching title vocabulary that the body never repeats needs this prefetch to surface the document.
- `dense_abstract` carries the abstract as a complete semantic unit, while `dense_chunk` indexes smaller sections of the document itself.
- `sparse_title` carries lexical hits on the title that the dense embedding averages out: rare entity names, jargon, specific model or paper names.

### Which Fusion to Use

[Reciprocal Rank Fusion](/documentation/search/hybrid-queries/index.md#reciprocal-rank-fusion-rrf) combines prefetches using document rank rather than raw scores, avoiding issues caused by dense and sparse scores operating on different scales. It’s the default configuration for this tutorial and a reasonable place to begin. In some scenarios, the variants below may produce stronger results.

For when RRF isn't enough:

- **[Weighted RRF](/documentation/search/hybrid-queries/index.md#weighted-rrf).** Per-prefetch weights in the rank-fusion formula. Use when one path is reliably stronger on your data; tune weights against a validation set, since a weight that helps one query class often hurts another.
- **[Distribution-Based Score Fusion (DBSF)](/documentation/search/hybrid-queries/index.md#distribution-based-score-fusion-dbsf).** Normalizes each prefetch's scores into a common range before summing. Preserves score magnitude (which RRF discards) when prefetches share a model family and score distributions are stable.
- **Custom formula via `FormulaQuery`.** Full control over how scores combine:

```python
query=models.FormulaQuery(
    formula=models.SumExpression(sum=[
        models.MultExpression(mult=[1.0, "$score[0]"]),
        models.MultExpression(mult=[0.5, "$score[1]"]),
        models.MultExpression(mult=[0.4, "$score[2]"]),
        models.MultExpression(mult=[0.3, "$score[3]"]),
    ]),
    defaults={"$score[1]": 0.0, "$score[2]": 0.0, "$score[3]": 0.0},
),
```

In this formula, `$score[i]` is the score from prefetch `i`, so the order of your `prefetch=` list matters. The `defaults` map provides fallback values (here `0.0`) for candidates that didn't appear in every prefetch, so the formula still evaluates.

<aside role="alert">
Linear combinations of raw scores break down when prefetches use different scoring scales — for example, dense scores in [0, 1] alongside unbounded BM25 scores. See <a href="/articles/hybrid-search/#why-not-a-linear-combination">Why not a linear combination?</a> for the full argument.
</aside>

The other two fusion strategies handle this for you: RRF discards scores entirely, and DBSF normalizes each prefetch before summing. With a custom formula, you have to normalize the scores yourself, typically using [decay functions](/documentation/search/search-relevance/index.md#decay-functions). The full FormulaQuery syntax lives in the [Score Boosting](/documentation/search/search-relevance/index.md#score-boosting) reference.

For RRF vs. DBSF guidance, see the [hybrid-search FAQ](/documentation/faq/qdrant-fundamentals/index.md#when-should-i-use-reciprocal-rank-fusion-rrf-vs-distribution-based-score-fusion-dbsf-for-hybrid-search).

### When to Boost, When to Rerank

Score boosting expresses ranking preferences that aren't captured by retrieval scores alone: recency, source authority, geographic proximity, content type. Use a FormulaQuery whose terms reference payload fields. For time-based decay on a `published_at` field, use an `exp_decay` expression from the [decay functions](/documentation/search/search-relevance/index.md#decay-functions) reference.

Use a reranker when the preference is "this is more relevant than that" but you can't easily express why in a closed form. Formulas are cheap and deterministic; rerankers are expensive but learn what you can't articulate.

---

For the step-by-step build-up that produced this design (dense baseline, plus sparse, plus title prefetch, plus grouping, plus boosting) with eval numbers showing the lift at each step, see the [accompanying notebook](https://githubtocolab.com/qdrant/examples/blob/master/multi-representation-search/multi-representation-search.ipynb). For a complementary view that varies the *query* across three representations against a single document representation, see the [Universal Query for Hybrid Retrieval demo](/course/essentials/day-5/universal-query-demo/index.md).

## Wrapping Up

Multi-representation retrieval is a schema decision, not a model decision. Once each representation has its own named vector, the Query API composes them at query time: prefetch per representation, fuse with RRF, and group by document for presentation. Use a `FormulaQuery` or weighted fusion when you need finer control over scoring.

Real corpora rarely have clean metadata across every document. If yours has gaps and you're unsure how to adapt this pipeline, ask in [Discord](https://discord.com/invite/qdrant). We want to hear the shape of your data.

Related reading:

- [Hybrid Search Revamped](/articles/hybrid-search/index.md) for the why behind RRF over linear weighting.
- [Hybrid Queries reference](/documentation/search/hybrid-queries/index.md) for the full Query API surface, including grouping.
- [Search Relevance reference](/documentation/search/search-relevance/index.md) for the formula and decay function syntax.