Multitenancy with LlamaIndex

If you are building a service that serves vectors for many independent users, and you want to isolate their data, the best practice is to use a single collection with payload-based partitioning. This approach is called multitenancy. Our guide on the Separate Partitions describes how to set it up in general, but if you use LlamaIndex as a backend, you may prefer reading a more specific instruction. So here it is!

Prerequisites

This tutorial assumes that you have already installed Qdrant and LlamaIndex. If you haven’t, please run the following commands:

pip install qdrant-client llama-index

We are going to use a local Docker-based instance of Qdrant. If you want to use a remote instance, please adjust the code accordingly. Here is how we can start a local instance:

docker run -d --name qdrant -p 6333:6333 -p 6334:6334 qdrant/qdrant:latest

Setting up LlamaIndex pipeline

We are going to implement an end-to-end example of multitenant application using LlamaIndex. We’ll be indexing the documentation of different Python libraries, and we definitely don’t want any users to see the results coming from a library they are not interested in. In real case scenarios, this is even more dangerous, as the documents may contain sensitive information.

Creating vector store

QdrantVectorStore is a wrapper around Qdrant that provides all the necessary methods to work with your vector database in LlamaIndex. Let’s create a vector store for our collection. It requires setting a collection name and passing an instance of QdrantClient.

from qdrant_client import QdrantClient
from llama_index.vector_stores import QdrantVectorStore

client = QdrantClient("http://localhost:6333")

vector_store = QdrantVectorStore(
    collection_name="my_collection",
    client=client,
)

Defining chunking strategy and embedding model

Any semantic search application requires a way to convert text queries into vectors - an embedding model. ServiceContext is a bundle of commonly used resources used during the indexing and querying stage in any LlamaIndex application. We can also use it to set up an embedding model - in our case, a local BAAI/bge-small-en-v1.5. set up

from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(
    embed_model="local:BAAI/bge-small-en-v1.5",
)

We can also control how our documents are split into chunks, or nodes using LLamaIndex’s terminology. The SimpleNodeParser splits documents into fixed length chunks with an overlap. The defaults are reasonable, but we can also adjust them if we want to. Both values are defined in tokens.

from llama_index.node_parser import SimpleNodeParser

node_parser = SimpleNodeParser.from_defaults(chunk_size=512, chunk_overlap=32)

Now we also need to inform the ServiceContext about our choices:

service_context = ServiceContext.from_defaults(
    embed_model="local:BAAI/bge-large-en-v1.5",
    node_parser=node_parser,
)

Both embedding model and selected node parser will be implicitly used during the indexing and querying.

Combining everything together

The last missing piece, before we can start indexing, is the VectorStoreIndex. It is a wrapper around VectorStore that provides a convenient interface for indexing and querying. It also requires a ServiceContext to be initialized.

from llama_index import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, service_context=service_context
)

Indexing documents

No matter how our documents are generated, LlamaIndex will automatically split them into nodes, if required, encode using selected embedding model, and then store in the vector store. Let’s define some documents manually and insert them into Qdrant collection. Our documents are going to have a single metadata attribute - a library name they belong to.

from llama_index.schema import Document

documents = [
    Document(
        text="LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models.",
        metadata={
            "library": "llama-index",
        },
    ),
    Document(
        text="Qdrant is a vector database & vector similarity search engine.",
        metadata={
            "library": "qdrant",
        },
    ),
]

Now we can index them using our VectorStoreIndex:

for document in documents:
    index.insert(document)

Performance considerations

Our documents have been split into nodes, encoded using the embedding model, and stored in the vector store. However, we don’t want to allow our users to search for all the documents in the collection, but only for the documents that belong to a library they are interested in. For that reason, we need to set up the Qdrant payload index, so the search is more efficient.

from qdrant_client import models

client.create_payload_index(
    collection_name="my_collection",
    field_name="metadata.library",
    field_type=models.PayloadSchemaType.KEYWORD,
)

The payload index is not the only thing we want to change. Since none of the search queries will be executed on the whole collection, we can also change its configuration, so the HNSW graph is not built globally. This is also done due to performance reasons. You should not be changing these parameters, if you know there will be some global search operations done on the collection.

client.update_collection(
    collection_name="my_collection",
    hnsw_config=models.HnswConfigDiff(payload_m=16, m=0),
)

Once both operations are completed, we can start searching for our documents.

Querying documents with constraints

Let’s assume we are searching for some information about large language models, but are only allowed to use Qdrant documentation. LlamaIndex has a concept of retrievers, responsible for finding the most relevant nodes for a given query. Our VectorStoreIndex can be used as a retriever, with some additional constraints - in our case value of the library metadata attribute.

from llama_index.vector_stores.types import MetadataFilters, ExactMatchFilter

qdrant_retriever = index.as_retriever(
    filters=MetadataFilters(
        filters=[
            ExactMatchFilter(
                key="library",
                value="qdrant",
            )
        ]
    )
)

nodes_with_scores = qdrant_retriever.retrieve("large language models")
for node in nodes_with_scores:
    print(node.text, node.score)
# Output: Qdrant is a vector database & vector similarity search engine. 0.60551536

The description of Qdrant was the best match, even though it didn’t mention large language models at all. However, it was the only document that belonged to the qdrant library, so there was no other choice. Let’s try to search for something that is not present in the collection.

Let’s define another retrieve, this time for the llama-index library:

llama_index_retriever = index.as_retriever(
    filters=MetadataFilters(
        filters=[
            ExactMatchFilter(
                key="library",
                value="llama-index",
            )
        ]
    )
)

nodes_with_scores = llama_index_retriever.retrieve("large language models")
for node in nodes_with_scores:
    print(node.text, node.score)
# Output: LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. 0.63576734

The results returned by both retrievers are different, due to the different constraints, so we implemented a real multitenant search application!