Chonkie
Chonkie is a no-nonsense, ultra-light, and lightning-fast chunking library designed for RAG (Retrieval-Augmented Generation) applications.
Chonkie integrates seamlessly with Qdrant through the QdrantHandshake class, allowing you to chunk, embed, and store text data without ever leaving the Chonkie SDK.
Setup
Install Chonkie with Qdrant support:
pip install "chonkie[qdrant]"
Basic Usage
The QdrantHandshake provides a simple interface for storing and searching chunks:
from chonkie import QdrantHandshake, SemanticChunker
# Initialize handshake with custom embedding model
handshake = QdrantHandshake(
url="http://localhost:6333",
collection_name="my_documents",
embedding_model="sentence-transformers/all-MiniLM-L6-v2"
)
# Create and write chunks
chunker = SemanticChunker()
chunks = chunker.chunk("Your text content here...")
handshake.write(chunks)
# Search using natural language
results = handshake.search(query="your search query", limit=5)
for result in results:
print(f"{result['score']}: {result['text']}")
Qdrant Cloud
handshake = QdrantHandshake(
url="https://your-cluster.qdrant.io",
api_key="your-api-key",
collection_name="my_collection",
embedding_model="BAAI/bge-small-en-v1.5" # Change to your preferred model
)
Complete RAG Pipeline
Build end-to-end RAG pipelines using Chonkie’s fluent Pipeline API:
from chonkie import Pipeline
# Process documents and store in Qdrant with custom embedding model
docs = (Pipeline()
.fetch_from("file", dir="./knowledge_base", ext=[".txt", ".md"])
.process_with("text")
.chunk_with("semantic", chunk_size=512)
.store_in("qdrant",
collection_name="knowledge",
url="http://localhost:6333",
embedding_model="sentence-transformers/all-MiniLM-L6-v2")
.run())
print(f"Ingested {len(docs)} documents into Qdrant")
Pipeline with Refinements
from chonkie import Pipeline
# Advanced pipeline with overlapping context and custom embeddings
docs = (Pipeline()
.fetch_from("file", dir="./docs")
.process_with("text")
.chunk_with("semantic", threshold=0.8)
.refine_with("overlap", context_size=100)
.store_in("qdrant",
url="https://your-cluster.qdrant.io",
api_key="your-api-key",
collection_name="knowledge_base",
embedding_model="BAAI/bge-small-en-v1.5")
.run())
Next steps
- Chonkie GitHub Repository
- Chonkie Documentation
- QdrantHandshake API Reference
- Chonkie Chunking Strategies
- Qdrant Python Client Documentation
