# Overview
<aside role="status">
  Qdrant offers a bi-weekly "Getting Started" live webinar with experts. 
  Register for an upcoming session <a href="https://tinyurl.com/qdrant-onboarding" target="_blank" rel="noopener noreferrer">here</a>!
</aside>

# Qdrant Overview

## Welcome! {#welcome}

Whether you’re getting started with Qdrant Open-Source or Cloud, this brief primer will help you with understanding an overview of the platform. **It’s highly recommended you read this overview before starting your development with Qdrant\!**

## Retrieval Process {#retrieval-process}

Vector search is a transformative information retrieval technique that goes beyond keyword matching to find data based on semantic meaning. It begins with **embedding models**, which convert unstructured data (text, images, audio) into **dense vector embeddings**, fixed-length lists of numbers that represent the data's conceptual essence. These vectors are mapped into a high-dimensional **vector space**, where items with similar meanings are positioned closely together. This spatial organization allows a search for "climate change" to retrieve documents about "global warming," even if the exact words differ.

![Workflow Overview](/docs/gettingstarted/Orientation-Guide-Diagram-1.png)

While dense vectors excel at capturing context, they can sometimes miss specific technical terms or unique identifiers. To bridge this gap, Qdrant also utilizes **sparse vectors** designed to capture precise **lexical matches** for specific keywords. Learn more in [this guide](/documentation/search/text-search/index.md). 

The process of generating embeddings from unstructured data is called [inference](/documentation/inference/index.md). On Qdrant Cloud, you can use [Cloud Inference](/documentation/cloud/inference/index.md) to let Qdrant generate embeddings on the server side. Alternatively, you can use a library like [FastEmbed](/documentation/fastembed/index.md) to generate embeddings on the client side.

The search process itself revolves into the concept of **Top-K** retrieval. When a user submits a request, it is instantly transformed into a **query vector**. The engine then calculates the similarity between this query vector and document vectors, returning the "Top-K" closest matches, where K is a user-defined number representing the desired volume of results. This allows developers to fine-tune the balance between the breadth of the search and the precision of the answers.

![Retrieval Process](/docs/gettingstarted/Orientation-Guide-Diagram-2.png)

To deliver the most robust search experience, Qdrant enables **Hybrid Retrieval** with semantic and lexical search, which you can learn more about [here](/documentation/search/hybrid-queries/index.md).

## Architecture {#architecture}

Qdrants operates in a client-server architecture, providing official [client libraries](/documentation/interfaces/index.md#client-libraries) for Python, JavaScript/TypeScript, Rust, Go, .NET, and Java. However, Qdrant exposes HTTP and gRPC [interfaces](/documentation/interfaces/index.md#client-libraries) to facilitate integration with virtually any programming language.

## Data Structure {#data-structure}

![Qdrant organizes data around collections - named sets of points that you search within. Each point consists of a vector (numerical representation of your data) and optional payload metadata. Points are identified by 64-bit integers or UUIDs. Collections support multiple vector types per point, dense or sparse, and named vectors are used for storing different embedding types in a single point. When creating a collection, you specify vector dimensionality and distance metric for each of the named vectors you want to store. The HNSW index enables fast similarity search by building a graph structure that efficiently traverses similar vectors. Payload indexes can be created on specific fields to enable filtering during search, extending the HNSW graph for combined vector similarity and metadata filtering in a single pass. Data is organized into segments - storage units containing vectors and indexes - which are automatically optimized in the background. For distributed deployments, collections are split into shards, each containing its own segments. Strict mode prevents performance issues by enforcing constraints like blocking queries on unindexed fields.](/docs/gettingstarted/Orientation-Guide-Diagram-3.png)

Qdrant collections are designed for horizontal and vertical scaling. You can learn about the details in the above diagram from links below:

* [Collections](/documentation/manage-data/collections/index.md)  
* [Points](/documentation/manage-data/points/index.md)  
* [Indexing](/documentation/manage-data/indexing/index.md)  
* [Storage](/documentation/manage-data/storage/index.md)  
* [Distributed Deployment](/documentation/operations/distributed_deployment/index.md)  
* [Strict Mode](/documentation/operations/administration/index.md#strict-mode)

## Deployments {#deployments}

Qdrant supports multiple deployment models to match different infrastructure and operational needs. The right option depends on your security constraints and operational model: Qdrant-managed infrastructure ([Managed Cloud](/documentation/cloud/index.md)), shared responsibility with your own clusters ([Hybrid Cloud](/documentation/hybrid-cloud/index.md)), or full ownership and independence ([Private Cloud](/documentation/private-cloud/index.md) or [Open Source](https://github.com/qdrant/qdrant)).

| Feature | Benefits | OSS | Managed | Hybrid | Private |
| :---- | :---- | :---: | :---: | :---: | :---: |
| Deployment | Choose how and where to deploy your Qdrant vector database based on your infrastructure needs. | ✅ | ✅ | ✅ | ✅ |
| High Availability | Automatic failover and replication to ensure your vector search is always available. | ❌ | ✅ | ✅ | ✅ |
| Zero-Downtime Upgrades | Upgrade your Qdrant database without any service interruption using replication. | ❌ | ✅ | ✅ | ✅ |
| Monitoring & Alerting | Built-in monitoring and alerting to observe the health and performance of your clusters. | ❌ | ✅ | ✅ | ❌ |
| Central Management UI | A unified console to create, configure, and manage all your Qdrant database clusters. | ❌ | ✅ | ✅ | ❌ |
| Horizontal & Vertical Scaling | Scale your clusters up, down, or out with automatic shard rebalancing and resharding support. | ❌ | ✅ | ✅ | ✅ |
| Backups & Disaster Recovery | Automated backups and restore functionality to ensure data durability and graceful recovery. | ❌ | ✅ | ✅ | ✅ |
| Data Privacy & Control | Keep all user data within your own infrastructure and network, not accessible by external parties. | ✅ | ❌ | ✅ | ✅ |
| Multi-Cloud & On-Premises | Deploy on AWS, GCP, Azure, on-premises, or edge locations based on your requirements. | ✅ | ❌ | ✅ | ✅ |
| Enterprise Support | Access to Qdrant's enterprise support services for production deployments. | ❌ | ✅ | ✅ | ✅ |
| No Infrastructure Management | Qdrant fully manages your infrastructure, so you can focus on building your application. | ❌ | ✅ | ❌ | ❌ |

## Scaling Considerations {#scaling-considerations}

The default configuration of Qdrant is sensible when you are starting to work on a POC or your side project. However, when transitioning to production and experiencing the growth of data size and concurrent users, your expectations regarding high availability, latency, or throughput will change. If you foresee scaling the service, you should build your system ready for these kinds of challenges from the outset. There are a few common scenarios you should be aware of, especially if you are taking your first steps with Qdrant, anticipate rapid growth soon, and want to make your system future-proof.

### Memory Requirements {#memory-requirements}

Memory is a critical resource when scaling vector search. By default, Qdrant stores vectors in RAM for maximum search performance, but as collections grow to millions of vectors, keeping everything in memory becomes expensive. Qdrant lets you control the memory usage by offloading data to disk, and you can enable that mechanism at any time, even on an existing collection:

* Frequently accessed vectors naturally stay cached, while others are read from disk only when needed, if you store vectors on disk  
* Graph traversal may require IO operations if you store the HNSW index on disk 

Put both on disk only when RAM is severely constrained, and ensure you have fast NVMe storage.

### Filtering {#filtering}

Vector search alone can provide a decent search experience to your users; however, semantic similarity is rarely the only factor you have to consider. Embeddings won’t capture attributes such as price, and typically, a filter on a specific payload attribute has to be applied. To make that filtering effective, there are some specific Qdrant mechanisms you should be aware of, including with **payload indexes**.

### Payload Indexes {#payload-indexes}

The payload index is a helper data structure that enables effective filtering on a particular payload attribute. It’s a concept familiar from relational databases, where we create an index on a column that we often filter by. Similarly, in Qdrant, you should also make a payload index on a field used for filtering. 

A unique aspect of the payload index is that it extends the HNSW graph, allowing filtering criteria to be applied during the semantic search phase. That means it’s a single-pass graph traversal, rather than pre- or post-filtering, which both have some drawbacks.

*![HNSW Graph with Filtering](/docs/gettingstarted/Orientation-Guide-Diagram-4.png)*

The fact that a payload index extends the HNSW graph means it’s more efficient to create it before indexing the data, as the optimizer will need to build the graph once. However, in some cases, you may already have a collection with a lot of vectors and recognize a need to filter by a specific attribute. In such cases, you can still create a payload index, yet **it won't immediately affect the HNSW graph**. 

<aside role="status">
The HNSW graph will only get created once the optimizer will run the segment reconstruction, and it might be triggered by  <a href="/documentation/manage-data/collections/#update-collection-parameters">modifying the parameters of HNSW</a>, such as temporarily setting up m=0 and then back to the original value (m=16 by default).
</aside>

[ACORN](/documentation/search/search/index.md#acorn-search-algorithm) is an additional mechanism that can improve the search accuracy if you have multiple high cardinality filters in your search operations.

### Scaling {#scaling}

Vertical scaling has natural limits \- eventually, you'll hit the maximum capacity of available hardware, and single-node deployments lack redundancy. Optimize scaling with **sharding**, **replication**, and **segment configuration** options.

#### Sharding {#sharding}

Qdrant uses sharding to split collections across multiple nodes, where each shard is an independent store of points. A common recommendation is to start with 12 shards, which provides flexibility to scale from 1 node up to 2, 3, 6, or 12 nodes without resharding. However, this approach can limit throughput on small clusters since each node manages multiple shards.

For optimal throughput, set `shard_number` equal to your node count (read more here). If you want to have better control over sharding, Qdrant supports [custom shards](/documentation/operations/distributed_deployment/index.md#user-defined-sharding).

#### Replication {#replication}

The replication factor determines how many copies of each shard exist. **For production systems, a replication factor of at least 2 is strongly recommended**.

![Choosing a Replication Factor: RF=1 causes operational problems: node restarts make parts of your collection unavailable, and data loss is permanent without backups. Use RF=1 only for non-production workloads, development environments, or when data can be easily regenerated. RF=2 provides the optimal balance for most production deployments: data remains available during single-node failures, read operations benefit from load balancing, and rolling updates work without downtime. RF>2 is a throughput optimization for read-heavy workloads. More replicas distribute read operations across more nodes, increasing cluster throughput. The cost is proportionally increased storage and higher write overhead.](/docs/gettingstarted/Orientation-Guide-Diagram-5.png)

#### Segment Configuration {#segment-configuration}

Each shard stores data in multiple [segments](/documentation/manage-data/storage/index.md). A segment stores all the data structures of a subset of the points in a shard. Fewer segments create larger segments with better search throughput, as larger HNSW indexes require fewer comparisons. However, larger segments take longer to build and recreate, slowing writes and optimization. More segments mean faster indexing but lower search performance since queries scan more segments. Read more on segment configuration.

<aside role="status">
In Qdrant Cloud, replication factor changes are applied automatically, and shard rebalancing is available. In self-hosted deployments, you must manually create or drop replicas and move shards between nodes as you scale.
</aside>

### Safety {#safety}

Some of the collection-level operations may degrade performance of the Qdrant cluster. Qdrant's [strict mode](/documentation/operations/administration/index.md#strict-mode) prevents inefficient usage patterns through multiple controls: it may block filtering and updates on non-indexed payload fields, limit query result sizes and timeout durations, restrict the complexity and number of filter conditions, cap payload index counts, constrain batch upsert sizes, enforce maximum collection storage limits (for vectors, payloads, and point counts), and implement rate limiting for read and write operations to prevent system overload. 

<aside role="status">
Qdrant Cloud disables filtering and updating by a non-indexed payload attribute by default, and also restricts the maximum number of payload indexes to 100\. You may consider disabling it temporarily if you want to execute some one-time queries on unindexed payload attributes, but in general you should need to do that.
</aside>

The OSS version does not enforce anything, but please consider enabling and configuring strict mode settings according to the application needs. Otherwise, some of the API calls may impact the performance of your cluster by using Qdrant in a suboptimal way.

## Getting Help {#getting-help}

If you're new to Qdrant, start with the free [**Essentials Course**](https://qdrant.tech/course/essentials/), which covers core concepts and best practices. For questions, troubleshooting, and community support, join the [**Discord Community**](https://qdrant.to/discord) \- it's the best place to get help from both Qdrant users and the core team. Paid customers have access to the [**Support Portal**](https://qdrant.to/cloud) through the Qdrant Cloud Console, for direct technical assistance and priority response times.
