Running Qdrant with GPU Support

Starting from version v1.13.0, Qdrant offers support for GPU acceleration.

However, GPU support is not included in the default Qdrant binary due to additional dependencies and libraries. Instead, you will need to use dedicated Docker images with GPU support.

Configuration

Qdrant includes a number of configuration options to control GPU usage. The following options are available:

gpu:
    # Enable GPU indexing.
    indexing: false
    # Force half precision for `f32` values while indexing.
    # `f16` conversion will take place 
    # only inside GPU memory and won't affect storage type.
    force_half_precision: false
    # Used vulkan "groups" of GPU. 
    # In other words, how many parallel points can be indexed by GPU.
    # Optimal value might depend on the GPU model.
    # Proportional, but doesn't necessary equal
    # to the physical number of warps.
    # Do not change this value unless you know what you are doing.
    # Default: 512
    groups_count: 512
    # Filter for GPU devices by hardware name. Case insensitive.
    # Comma-separated list of substrings to match 
    # against the gpu device name.
    # Example: "nvidia"
    # Default: "" - all devices are accepted.
    device_filter: ""
    # List of explicit GPU devices to use.
    # If host has multiple GPUs, this option allows to select specific devices
    # by their index in the list of found devices.
    # If `device_filter` is set, indexes are applied after filtering.
    # By default, all devices are accepted.
    devices: null
    # How many parallel indexing processes are allowed to run.
    # Default: 1
    parallel_indexes: 1
    # Allow to use integrated GPUs.
    # Default: false
    allow_integrated: false
    # Allow to use emulated GPUs like LLVMpipe. Useful for CI.
    # Default: false
    allow_emulated: false

It is not recommended to change these options unless you are familiar with the Qdrant internals and the Vulkan API.

Standalone GPU Support

For standalone usage, you can build Qdrant with GPU support by running the following command:

cargo build --release --features gpu

Ensure your device supports Vulkan API v1.3. This enables compatibility with Apple Silicon, Intel GPUs, and CPU emulators.

NVIDIA GPUs

Prerequisites

To use Docker with NVIDIA GPU support, ensure the following are installed on your host:

Most AI or CUDA images on Amazon/GCP come pre-configured with the NVIDIA container toolkit.

Docker images with NVidia GPU support

Docker images with NVIDIA GPU support use the tag suffix gpu-nvidia, e.g., qdrant/qdrant:v1.13.0-gpu-nvidia. These images include all necessary dependencies.

To enable GPU support, use the --gpus=all flag with Docker settings. Example:

# `--gpus=all` flag says to Docker that we want to use GPUs.
# `-e QDRANT__GPU__indexing=1` flag says to QDrant that we want to use GPUs for indexing.
docker run \
	--rm \
	--gpus=all \
	-p 6333:6333 \
	-p 6334:6334 \
	-e QDRANT__GPU__indexing=1 \
	qdrant/qdrant:gpu-nvidia-latest

To ensure that the GPU was initialized correctly, you may check it in logs. First Qdrant prints all found GPU devices without filtering and then prints list of all created devices:

2025-01-13T11:58:29.124087Z  INFO gpu::instance: Foung GPU device: NVIDIA GeForce RTX 3090    
2025-01-13T11:58:29.124118Z  INFO gpu::instance: Foung GPU device: llvmpipe (LLVM 15.0.7, 256 bits)    
2025-01-13T11:58:29.124138Z  INFO gpu::device: Create GPU device NVIDIA GeForce RTX 3090    

Here you can see that two devices were found: RTX 3090 and llvmpipe (a CPU-emulated GPU which is included in the Docker image). Later, you will see that only RTX was initialized.

This concludes the setup. Now, you can start using this Qdrant instance.

Troubleshooting NVIDIA GPUs

If your GPU is not detected in Docker, make sure your driver and nvidia-container-toolkit are up-to-date.

Verify Vulkan API visibility in the Docker container using:

docker run --rm --gpus=all qdrant/qdrant:gpu-nvidia-latest vulkaninfo --summary

The system may show you an error message explaining why the NVIDIA device is not visible. Note that if your NVIDIA GPU is not visible in Docker, the Docker image cannot use libGLX_nvidia.so.0 on your host. Here is what an error message could look like:

ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get `vkCreateInstance` via `vk_icdGetInstanceProcAddr` for ICD libGLX_nvidia.so.0
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Failed to CreateInstance in ICD 0. Skipping ICD.

To resolve errors, update your NVIDIA container runtime configuration:

sudo nano /etc/nvidia-container-runtime/config.toml

Set no-cgroups=false, save the configuration, and restart Docker:

sudo systemctl restart docker

AMD GPUs

Prerequisites

Running Qdrant with AMD GPUs requires ROCm to be installed on your host.

Docker images with AMD GPU support

Docker images for AMD GPUs use the tag suffix gpu-amd, e.g., qdrant/qdrant:v1.13.0-gpu-amd. These images include all required dependencies.

To enable GPU for Docker, you need additional --device /dev/kfd --device /dev/dri flags. To enable GPU for Qdrant you need to set the enable flag. Here is an example:

# `--device /dev/kfd --device /dev/dri` flags say to Docker that we want to use GPUs.
# `-e QDRANT__GPU__indexing=1` flag says to QDrant that we want to use GPUs for indexing.
docker run \
	--rm \
	--device /dev/kfd --device /dev/dri \
	-e QDRANT__log_level=debug \
	-p 6333:6333 \
	-p 6334:6334 \
	-e QDRANT__GPU__indexing=1 \
	qdrant/qdrant:gpu-amd-latest

Check logs to confirm GPU initialization. Example log output:

2025-01-10T11:56:55.926466Z  INFO gpu::instance: Foung GPU device: AMD Radeon Graphics (RADV GFX1103_R1)
2025-01-10T11:56:55.926485Z  INFO gpu::instance: Foung GPU device: llvmpipe (LLVM 17.0.6, 256 bits) 
2025-01-10T11:56:55.926504Z  INFO gpu::device: Create GPU device AMD Radeon Graphics (RADV GFX1103_R1)

This concludes the setup. In a basic scenario, you won’t need to configure anything else.

Known limitations

  • Platform Support: Docker images are only available for Linux x86_64. Windows, macOS, ARM, and other platforms are not supported.

  • Memory Limits: Each GPU can process up to 16GB of vector data per indexing iteration.

Due to this limitation, you should not create segments where either original vectors OR quantized vectors are larger than 16GB.

For example, the collection with 1536d vectors and scalar quatization can have at most

16Gb / 1536 ~= 11 million vectors per segment

And without quantization

16Gb / 1536 * 4 ~= 2.7 million vectors per segment

Maximal size of the segment can be configured in collection settings.

PATCH collections/{collection_name}
{
  "optimizers_config": {
    "max_segment_size": 1000000
  }
}

Note, that max_segment_size is specified in KiloBytes.

Was this page useful?

Thank you for your feedback! 🙏

We are sorry to hear that. 😔 You can edit this page on GitHub, or create a GitHub issue.