Superlinked

Superlinked is a self-hosted inference engine (SIE) that serves 85+ embedding models (dense, sparse, and multivector / ColBERT) from a single endpoint. The sie-qdrant package lets you use SIE as the embedding provider for Qdrant collections. SIE encodes your text into vectors, and you store and search them in Qdrant.

sie-qdrant is currently Python only. TypeScript support is not yet available.

Installation

pip install sie-qdrant

This installs sie-sdk and qdrant-client (v1.7+) as dependencies. You also need a running SIE instance; see the Superlinked quickstart for deployment options (Docker, GPU).

Vectorizer

SIEVectorizer calls SIE and returns dense vectors as list[float], ready to pass into Qdrant’s PointStruct(vector=...) and query_points():

from sie_qdrant import SIEVectorizer

vectorizer = SIEVectorizer(
    base_url="http://localhost:8080",
    model="NovaSearch/stella_en_400M_v5",
)

Any model SIE supports for dense embeddings works, just change the model parameter:

# Nomic MoE (768-dim, multilingual)
vectorizer = SIEVectorizer(model="nomic-ai/nomic-embed-text-v2-moe")

# E5 (1024-dim, instruction-tuned - SIE handles query vs document encoding automatically)
vectorizer = SIEVectorizer(model="intfloat/e5-large-v2")

# BGE-M3 (1024-dim, also supports sparse output for hybrid search)
vectorizer = SIEVectorizer(model="BAAI/bge-m3")

See the Model Catalog for all supported models.

Full example

Create a Qdrant collection, embed documents with SIE, and search:

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sie_qdrant import SIEVectorizer

vectorizer = SIEVectorizer(
    base_url="http://localhost:8080",
    model="NovaSearch/stella_en_400M_v5",
)

client = QdrantClient("http://localhost:6333")

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)

texts = [
    "Machine learning is a subset of artificial intelligence.",
    "Neural networks are inspired by biological neurons.",
    "Deep learning uses multiple layers of neural networks.",
    "Python is popular for machine learning development.",
]

vectors = vectorizer.embed_documents(texts)

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(id=i, vector=v, payload={"text": t})
        for i, (t, v) in enumerate(zip(texts, vectors))
    ],
)

query_vec = vectorizer.embed_query("What is deep learning?")

results = client.query_points(
    collection_name="documents",
    query=query_vec,
    limit=2,
)

for point in results.points:
    print(point.payload["text"])

Named vectors (dense + sparse)

For hybrid search, SIENamedVectorizer produces multiple vector types in a single SIE call. The model must support all requested output types: BAAI/bge-m3 supports both dense and sparse, jinaai/jina-colbert-v2 supports dense and multivector.

SIE sparse vectors (from SPLADE or BGE-M3) are learned sparse representations that capture semantic similarity, not just term overlap. Qdrant stores them natively in its compact indices + values format.

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct,
    SparseVectorParams, SparseVector,
)
from sie_qdrant import SIENamedVectorizer

# One SIE call produces both dense and sparse vectors
vectorizer = SIENamedVectorizer(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3",
    output_types=["dense", "sparse"],
)

client = QdrantClient("http://localhost:6333")

client.create_collection(
    collection_name="documents",
    vectors_config={"dense": VectorParams(size=1024, distance=Distance.COSINE)},
    sparse_vectors_config={"sparse": SparseVectorParams()},
)

texts = ["First document", "Second document"]
named_vectors = vectorizer.embed_documents(texts)

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=i,
            vector={
                "dense": v["dense"],
                "sparse": SparseVector(**v["sparse"]),
            },
            payload={"text": t},
        )
        for i, (t, v) in enumerate(zip(texts, named_vectors))
    ],
)

Hybrid search with Reciprocal Rank Fusion

Combine dense and sparse results via Qdrant’s prefetch + RRF fusion:

from qdrant_client.models import Prefetch, FusionQuery, Fusion, SparseVector

query = vectorizer.embed_query("search text")

results = client.query_points(
    collection_name="documents",
    prefetch=[
        Prefetch(query=query["dense"], using="dense", limit=20),
        Prefetch(query=SparseVector(**query["sparse"]), using="sparse", limit=20),
    ],
    query=FusionQuery(fusion=Fusion.RRF),
    limit=5,
)

Multivector (ColBERT) and late interaction

Qdrant supports native MaxSim retrieval for ColBERT-style late-interaction models via MultiVectorConfig. Combined with SIENamedVectorizer, this enables true late-interaction retrieval without client-side scoring:

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct,
    MultiVectorConfig, MultiVectorComparator,
)
from sie_qdrant import SIENamedVectorizer

vectorizer = SIENamedVectorizer(
    base_url="http://localhost:8080",
    model="jinaai/jina-colbert-v2",
    output_types=["dense", "multivector"],
)

client = QdrantClient("http://localhost:6333")
client.create_collection(
    collection_name="documents",
    vectors_config={
        "dense": VectorParams(size=768, distance=Distance.COSINE),
        "multivector": VectorParams(
            size=128,
            distance=Distance.COSINE,
            multivector_config=MultiVectorConfig(
                comparator=MultiVectorComparator.MAX_SIM,
            ),
        ),
    },
)

texts = ["First document", "Second document"]
named_vectors = vectorizer.embed_documents(texts)

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=i,
            vector={"dense": v["dense"], "multivector": v["multivector"]},
            payload={"text": t},
        )
        for i, (t, v) in enumerate(zip(texts, named_vectors))
    ],
)

query = vectorizer.embed_query("search text")
results = client.query_points(
    collection_name="documents",
    query=query["multivector"],
    using="multivector",
    limit=5,
)

Configuration

ParameterTypeDefaultDescription
base_urlstrhttp://localhost:8080SIE server URL
modelstrBAAI/bge-m3Model to use for embeddings (catalog)
instructionstrNoneInstruction prefix for instruction-tuned models (e.g. E5)
output_dtypestrNoneOutput dtype: float32, float16, int8, binary
gpustrNoneTarget GPU type for routing
optionsdictNoneModel-specific options
timeout_sfloat180.0Request timeout in seconds

Further reading

Was this page useful?

Thank you for your feedback! 🙏

We are sorry to hear that. 😔 You can edit this page on GitHub, or create a GitHub issue.