Cloud system

Qdrant Cloud Inference

Run Inferencing Natively in Qdrant Cloud

Qdrant Cloud Inference lets you generate and store text and image embeddings directly within your managed Qdrant Cloud cluster, eliminating external pipelines and supporting multimodal and hybrid search from a single API.

Cloud inference scheme

Embed faster. Query faster. Go hybrid or multimodal.

AI

Vector search with built-in embeddings

Generate embeddings inside the network of your Qdrant Cloud cluster. No separate model server or pipeline needed.

Bars growth

In-cluster inference, lower latency

Generate embeddings and run search in-region on AWS, Azure, or GCP (US only). No external hops, no extra egress. Ideal for real-time apps that can’t afford delays or data transfer overhead.

Cloud data

Supports Dense, Sparse & Image Models

Build vector search the way you need. Use dense models like all-MiniLM-L6-v2 for fast semantic match, sparse models like splade-pp-en-v1 or bm25 for keyword recall, or CLIP-style models for image and text. Need Hybrid and/or multimodal search? Covered.

Get started with up to 5 million free tokens

*Per model, renewed monthly

Get Started Today Cluster screenshot

Qdrant Cloud Inference Documentation

Read the Documentation

FAQs

Is Qdrant Cloud Inference available on free clusters?
Yes, free models and external model providers can be used in free Qdrant Cloud clusters. Paid models require a paid cluster.
What kinds of data can I embed?
You can embed both text and image data using the current available models.
Where are the embeddings generated?
For Qdrant hosted models, embeddings are generated inside the network of your cluster, which removes external API overhead. If you use external model providers, embeddings are generated by this provider.
How much does it cost?
Inference is billed per token, and costs depend on the model. Each month, Qdrant Paid Cloud users get up to 5 million tokens free, depending on the model. Several models are offered for free completely, with no token limits. For more details, refer to the Inference section on your cluster detail page.
How do I get started?
For new clusters, Cloud Inference is enabled by default. For older clusters that were created before the release of Cloud Inference, you can enable it from the cluster detail page in the Qdrant Cloud Console.
Will there be options for other embedding models?
We plan to add models incrementally based on customer feedback.

Run Inferencing Natively in Qdrant Cloud

Get Started