Fine-Tuning Sparse Embeddings for E-Commerce Search | Part 5: From Research to Product

Thierry Damiba

March 09, 2026

Fine-Tuning Sparse Embeddings for E-Commerce Search | Part 5: From Research to Product

This is Part 5 of a series on fine-tuning sparse embeddings for e-commerce search. Parts 1–4 built the pipeline from scratch. This article packages it into a tool anyone can use.

Series:

Part 1: Why Sparse Embeddings Beat BM25
Part 2: Training SPLADE on Modal
Part 3: Evaluation & Hard Negatives
Part 4: Specialization vs Generalization
Part 5: From Research to Product (here)

In Parts 1 through 4, we built a SPLADE fine-tuning pipeline piece by piece: data loading, Modal GPU training, Qdrant evaluation, ANCE hard negative mining, cross-domain experiments. The code worked. The results were strong: 28% over BM25 on Amazon ESCI.

Using it required reading four articles, cloning a repo, understanding the training loop internals, wiring up Modal volumes, and configuring Qdrant connections manually. That’s fine for a series walkthrough. It’s not fine for someone who has a product catalog and wants a better search model by end of day.

So we packaged everything into qdrant-sparse-finetune: an open-source CLI and web dashboard that runs the entire pipeline (synthetic query generation, SPLADE training with ANCE, evaluation, and HuggingFace publishing) with a single command.

The Problem We’re Solving

From research repo to production CLI

The series repo (finetune-ecommerce-search) is research code. It demonstrates how sparse embedding fine-tuning works. Actually using it on your data means you need to:

Format your product data to match the expected schema
Either provide labeled queries or set up an LLM API for synthetic generation
Configure Modal volumes and GPU settings
Wire up Qdrant credentials for indexing and mining
Run training, manually trigger ANCE iterations
Evaluate, interpret metrics
Publish to HuggingFace if you want to share the model

Each step has its own configuration, its own failure modes, and its own set of assumptions about how the previous step ran. It’s a pipeline with no orchestration.

qdrant-sparse-finetune handles all of that:

pip install git+https://github.com/qdrant/sparse-finetune.git
qdrant-finetune setup
qdrant-finetune pipeline --data products.csv --gpu modal

Three commands. The setup wizard configures Qdrant, your LLM provider, and your GPU backend (Modal or Vultr). The pipeline command runs everything end-to-end. When training finishes, it asks if you want to publish to HuggingFace and gives you the link.

Two Interfaces, Same Pipeline

The CLI

For developers and automation:

# End-to-end on Modal (serverless A10G)
qdrant-finetune pipeline --data products.csv --gpu modal

# End-to-end on Vultr (dedicated cloud GPU)
qdrant-finetune pipeline --data products.csv --gpu vultr

# Or step by step
qdrant-finetune generate-queries --data products.csv --synth-model gpt-4o-mini
qdrant-finetune train --data products.csv --queries queries.jsonl --gpu modal
qdrant-finetune evaluate --model output/finetune/final --queries test_queries.jsonl
qdrant-finetune publish --model output/finetune/final --repo your-name/your-model

The pipeline command chains all four steps. If you already have labeled queries, pass --queries to skip generation. If you pass --repo, it publishes automatically. If you don’t, it asks after training completes:

Training complete! Publish to HuggingFace Hub? [Y/n]: y
  Repo name (e.g. your-username/my-splade-model): acme/product-search-splade
  Private repo? [y/N]: n

Published → https://huggingface.co/acme/product-search-splade

No buried flags. The most valuable output, a shareable model URL, is the natural endpoint of the workflow.

The Dashboard

For teams who prefer a visual interface:

qdrant-finetune studio

This launches a web dashboard with tabs for each stage of the pipeline:

Train. Configure the base model, ANCE iterations, batch size, and GPU backend. Submit a job and watch live logs with a loss chart that updates as training progresses.

Evaluate. Point at a trained model and test queries. Get metric cards for nDCG@10, MRR@10, Recall, and Precision.

Collections. Browse your Qdrant collections, check point counts, and run test searches against indexed products. Useful for sanity-checking that indexing worked before evaluation.

Publish. Enter a model path and HuggingFace repo name. Click publish.

Jobs. Full history of every training, evaluation, and publish job. Expand any job to see its configuration, logs, and results. Jobs are tracked whether they were launched from the CLI or the dashboard.

Settings. Manage API keys for Qdrant, OpenAI, Anthropic, OpenRouter, Vultr, and HuggingFace. All stored in .env, all masked in the UI.

The dashboard is the recommended starting point. You can see everything the tool can do without reading documentation, and the job history gives you a record of every experiment.

What Changed From the Series Code

Production architecture: CLI, dashboard, and GPU backends

The pipeline from Parts 2-4 is the same underneath. The toolkit wraps it with:

Automatic data handling. Pass a CSV, JSON, JSONL, or HuggingFace dataset path. The loader auto-detects columns (title, description, name, product_id) and builds product text using the same bracket-brand, pipe-separated format from Part 2. No manual formatting.

Synthetic query generation. If you don’t have labeled queries, the toolkit generates them using any litellm-supported model. Pass --synth-model gpt-4o-mini or --synth-model ollama/llama3 for local generation. It reads your product catalog and generates realistic search queries with relevance labels.

# OpenAI
qdrant-finetune generate-queries --data products.csv --synth-model gpt-4o-mini

# Local Ollama
qdrant-finetune generate-queries --data products.csv --synth-model ollama/llama3

# Anthropic
qdrant-finetune generate-queries --data products.csv --synth-model anthropic/claude-sonnet-4-20250514

Multi-backend GPU support. Modal and Vultr are both supported. The --gpu modal flag handles volume creation, data upload, and job launching. --gpu vultr does the same for Vultr Cloud GPU. Both download the trained model back to your local machine when done.

Interactive publishing. The original series code required you to manually load the model and call push_to_hub. The toolkit prompts you after training completes, asks for the repo name, handles authentication, and prints the HuggingFace URL.

Job tracking. Every operation (train, evaluate, publish) is logged with configuration, timestamps, status, and output. The dashboard surfaces this as a job history. The CLI stores it in a local SQLite database.

One-Line Python API

For notebooks and scripts, the same pipeline is available as a function call:

from qdrant_finetune import finetune

model_path = finetune("products.csv")

That single line:

Loads your product data
Generates synthetic queries via LLM
Creates a SPLADE encoder
Runs 3 rounds of ANCE hard negative mining against Qdrant
Saves the fine-tuned model

For more control:

from qdrant_finetune import Trainer, FinetuneConfig

config = FinetuneConfig(
    base_model="distilbert/distilbert-base-uncased",
    ance_iterations=5,
    batch_size=64,
    mining_top_k=20,
    num_negatives=3,
)

trainer = Trainer(config)
trainer.fit(data="products.csv", queries="queries.csv")
trainer.index(data="products.csv", collection_name="my_products")
metrics = trainer.evaluate(queries="test_queries.csv")
print(metrics)  # {'ndcg@10': 0.389, 'mrr@10': 0.387, ...}

Same ANCE loop from Part 3, same evaluation metrics. The Trainer class wraps the training loop, Qdrant indexing, hard negative mining, and evaluation into a coherent API.

Getting Started

# Install from GitHub
pip install git+https://github.com/qdrant/sparse-finetune.git

# Configure (interactive wizard)
qdrant-finetune setup

# Run the full pipeline
qdrant-finetune pipeline --data your-products.csv --gpu modal  # or --gpu vultr

The setup wizard validates your Qdrant connection, configures your LLM provider for synthetic queries, and sets up your GPU backend: Modal for serverless pay-per-second GPUs, or Vultr for dedicated cloud instances. Everything gets written to a .env file.

Or skip the CLI and launch the dashboard:

qdrant-finetune studio

The source code is on GitHub. File issues, submit PRs, or fork it for your own use case.

What’s Actually Different

The series taught the concepts. The toolkit removes the friction. Specifically:

No pipeline assembly. You don’t wire up Modal → training → Qdrant → mining → retraining. One command does it.
No data formatting. Auto-detection handles CSV columns. The text builder applies the formatting conventions from Part 2 automatically.
No query labeling requirement. Synthetic generation means you can start with just a product CSV. No click logs, no relevance labels needed.
No credential juggling. Setup once, stored in .env, used everywhere.
No manual publishing. Interactive prompt after training. HuggingFace link as the final output.

The 28% improvement over BM25 from Part 3 isn’t locked behind a research repo anymore. It’s a pip install away.

Series Summary

Part 1: Why sparse embeddings for e-commerce: SPLADE combines keyword precision with learned expansion
Part 2: Training pipeline on Modal: A100 training with persistent checkpoints
Part 3: Evaluation and hard negatives: +28% vs BM25, ANCE mining with Qdrant
Part 4: Specialization vs generalization: Domain-specific vs multi-domain tradeoffs
Part 5: From research to product: CLI + dashboard that runs the full pipeline

Articles

Articles

Fine-Tuning Sparse Embeddings for E-Commerce Search | Part 5: From Research to Product

The Problem We’re Solving

Two Interfaces, Same Pipeline

The CLI

The Dashboard

What Changed From the Series Code

One-Line Python API

Getting Started

What’s Actually Different

Series Summary

Was this page useful?

About cookies on this site

Targeting Cookies

Functional Cookies

Strictly Necessary Cookies

Performance Cookies