Fondant

Fondant is an open-source framework that aims to simplify and speed up large-scale data processing by making containerized components reusable across pipelines and execution environments. Benefit from built-in features such as autoscaling, data lineage, and pipeline caching, and deploy to (managed) platforms such as Vertex AI, Sagemaker, and Kubeflow Pipelines.

Fondant comes with a library of reusable components that you can leverage to compose your own pipeline, including a Qdrant component for writing embeddings to Qdrant.

Usage

A data load pipeline for RAG using Qdrant.

A simple ingestion pipeline could look like the following:

import pyarrow as pa
from fondant.pipeline import Pipeline

indexing_pipeline = Pipeline(
    name="ingestion-pipeline",
    description="Pipeline to prepare and process data for building a RAG solution",
    base_path="./fondant-artifacts",
)

# An custom implemenation of a read component. 
text = indexing_pipeline.read(
    "path/to/data-source-component",
    arguments={
        # your custom arguments 
    }
)

chunks = text.apply(
    "chunk_text",
    arguments={
        "chunk_size": 512,
        "chunk_overlap": 32,
    },
)

embeddings = chunks.apply(
    "embed_text",
    arguments={
        "model_provider": "huggingface",
        "model": "all-MiniLM-L6-v2",
    },
)

embeddings.write(
    "index_qdrant",
    arguments={
        "url": "http:localhost:6333",
        "collection_name": "some-collection-name",
    },
    cache=False,
)

Once you have a pipeline, you can easily run it using the built-in CLI. Fondant allows you to run the pipeline in production across different clouds.

The first component is a custom read module that needs to be implemented and cannot be used off the shelf. A detailed tutorial on how to rebuild this pipeline is provided on GitHub.

Next steps

More information about creating your own pipelines and components can be found in the Fondant documentation.