# Snapshots
# Backup & Restore Qdrant with Snapshots

| Time: 20 min | Level: Beginner |  |    |
|--------------|-----------------|--|----|

A collection is a basic unit of data storage in Qdrant. It contains vectors, their IDs, and payloads. However, keeping the search efficient requires additional data structures to be built on top of the data. Building these data structures may take a while, especially for large collections.
That's why using snapshots is the best way to export and import Qdrant collections, as they contain all the bits and pieces required to restore the entire collection efficiently.

This tutorial will show you how to create a snapshot of a collection and restore it. Since working with snapshots in a distributed environment might be thought to be a bit more complex, we will use a 3-node Qdrant cluster. However, the same approach applies to a single-node setup.

<aside role="status">Snapshots cannot be created in local mode of Python SDK. You need to spin up a Qdrant Docker container or use Qdrant Cloud.</aside>

You can use the techniques described in this page to migrate a cluster. Follow the instructions
in this tutorial to create and download snapshots. When you [Restore from snapshot](#restore-from-snapshot), restore your data to the new cluster.

## Prerequisites

Let's assume you already have a running Qdrant instance or a cluster. If not, you can follow the [installation guide](/documentation/operations/installation/index.md) to set up a local Qdrant instance or use [Qdrant Cloud](https://cloud.qdrant.io/) to create a cluster in a few clicks.

Once the cluster is running, let's install the required dependencies:

```shell
pip install qdrant-client datasets
```

### Establish a connection to Qdrant

We are going to use the Python SDK and raw HTTP calls to interact with Qdrant. Since we are going to use a 3-node cluster, we need to know the URLs of all the nodes. For the simplicity, let's keep them all in constants, along with the API key, so we can refer to them later:

```python
QDRANT_MAIN_URL = "https://my-cluster.com:6333"
QDRANT_NODES = (
    "https://node-0.my-cluster.com:6333",
    "https://node-1.my-cluster.com:6333",
    "https://node-2.my-cluster.com:6333",
)
QDRANT_API_KEY = "my-api-key"
```

<aside role="status">If you are using Qdrant Cloud, you can find the URL and API key in the <a href="https://cloud.qdrant.io/">Qdrant Cloud dashboard</a>.</aside>

We can now create a client instance:

```python
from qdrant_client import QdrantClient

client = QdrantClient(QDRANT_MAIN_URL, api_key=QDRANT_API_KEY)
```

First of all, we are going to create a collection from a precomputed dataset. If you already have a collection, you can skip this step and start by [creating a snapshot](#create-and-download-snapshots).

<details>
    <summary>(Optional) Create collection and import data</summary>

### Load the dataset

We are going to use a dataset with precomputed embeddings, available on Hugging Face Hub. The dataset is called [Qdrant/arxiv-titles-instructorxl-embeddings](https://huggingface.co/datasets/Qdrant/arxiv-titles-instructorxl-embeddings) and was created using the [InstructorXL](https://huggingface.co/hkunlp/instructor-xl) model. It contains 2.25M embeddings for the titles of the papers from the [arXiv](https://arxiv.org/) dataset.

Loading the dataset is as simple as:

```python
from datasets import load_dataset

dataset = load_dataset(
    "Qdrant/arxiv-titles-instructorxl-embeddings", split="train", streaming=True
)
```

We used the streaming mode, so the dataset is not loaded into memory. Instead, we can iterate through it and extract the id and vector embedding:

```python
for payload in dataset:
    id_ = payload.pop("id")
    vector = payload.pop("vector")
    print(id_, vector, payload)
```

A single payload looks like this:

```json
{
  'title': 'Dynamics of partially localized brane systems',
  'DOI': '1109.1415'
}
```


### Create a collection

First things first, we need to create our collection. We're not going to play with the configuration of it, but it makes sense to do it right now.
The configuration is also a part of the collection snapshot.

```python
from qdrant_client import models

if not client.collection_exists("test_collection"):
    client.create_collection(
        collection_name="test_collection",
        vectors_config=models.VectorParams(
            size=768,  # Size of the embedding vector generated by the InstructorXL model
            distance=models.Distance.COSINE
        ),
    )
```

### Upload the dataset

Calculating the embeddings is usually a bottleneck of the vector search pipelines, but we are happy to have them in place already. Since the goal of this tutorial is to show how to create a snapshot, **we are going to upload only a small part of the dataset**.

```python
ids, vectors, payloads = [], [], []
for payload in dataset:
    id_ = payload.pop("id")
    vector = payload.pop("vector")

    ids.append(id_)
    vectors.append(vector)
    payloads.append(payload)

    # We are going to upload only 1000 vectors
    if len(ids) == 1000:
        break

client.upsert(
    collection_name="test_collection",
    points=models.Batch(
        ids=ids,
        vectors=vectors,
        payloads=payloads,
    ),
)
```

Our collection is now ready to be used for search. Let's create a snapshot of it.

</details>

If you already have a collection, you can skip the previous step and start by [creating a snapshot](#create-and-download-snapshots).

## Create and download snapshots

Qdrant exposes an HTTP endpoint to request creating a snapshot, but we can also call it with the Python SDK.
Our setup consists of 3 nodes, so we need to call the endpoint **on each of them** and create a snapshot on each node. While using Python SDK, that means creating a separate client instance for each node.


<aside role="status">You may get a timeout error, if the collection size is big. You can trigger the snapshot process in the background, without awaiting for the result, by using <code>wait=false</code> parameter. You can always <a href="/documentation/operations/snapshots/#list-snapshot">list all the snapshots through the API</a> later on.</aside>


```python
snapshot_urls = []
for node_url in QDRANT_NODES:
    node_client = QdrantClient(node_url, api_key=QDRANT_API_KEY)
    snapshot_info = node_client.create_snapshot(collection_name="test_collection")

    snapshot_url = f"{node_url}/collections/test_collection/snapshots/{snapshot_info.name}"
    snapshot_urls.append(snapshot_url)
```

```http
// for `https://node-0.my-cluster.com:6333`
POST /collections/test_collection/snapshots

// for `https://node-1.my-cluster.com:6333`
POST /collections/test_collection/snapshots

// for `https://node-2.my-cluster.com:6333`
POST /collections/test_collection/snapshots
```

<details>
    <summary>Response</summary>

```json
{
  "result": {
    "name": "test_collection-559032209313046-2024-01-03-13-20-11.snapshot",
    "creation_time": "2024-01-03T13:20:11",
    "size": 18956800
  },
  "status": "ok",
  "time": 0.307644965
}
```
</details>


Once we have the snapshot URLs, we can download them. Please make sure to include the API key in the request headers.
Downloading the snapshot **can be done only through the HTTP API**, so we are going to use the `requests` library.

```python
import requests
import os

# Create a directory to store snapshots
os.makedirs("snapshots", exist_ok=True)

local_snapshot_paths = []
for snapshot_url in snapshot_urls:
    snapshot_name = os.path.basename(snapshot_url)
    local_snapshot_path = os.path.join("snapshots", snapshot_name)

    response = requests.get(
        snapshot_url, headers={"api-key": QDRANT_API_KEY}
    )
    with open(local_snapshot_path, "wb") as f:
        response.raise_for_status()
        f.write(response.content)

    local_snapshot_paths.append(local_snapshot_path)
```

Alternatively, you can use the `wget` command:

```bash
wget https://node-0.my-cluster.com:6333/collections/test_collection/snapshots/test_collection-559032209313046-2024-01-03-13-20-11.snapshot \
    --header="api-key: ${QDRANT_API_KEY}" \
    -O node-0-snapshot.snapshot

wget https://node-1.my-cluster.com:6333/collections/test_collection/snapshots/test_collection-559032209313047-2024-01-03-13-20-12.snapshot \
    --header="api-key: ${QDRANT_API_KEY}" \
    -O node-1-snapshot.snapshot

wget https://node-2.my-cluster.com:6333/collections/test_collection/snapshots/test_collection-559032209313048-2024-01-03-13-20-13.snapshot \
    --header="api-key: ${QDRANT_API_KEY}" \
    -O node-2-snapshot.snapshot
```

The snapshots are now stored locally. We can use them to restore the collection to a different Qdrant instance, or treat them as a backup. We will create another collection using the same data on the same cluster.

## Restore from snapshot

Our brand-new snapshot is ready to be restored. Typically, it is used to move a collection to a different Qdrant instance, but we are going to use it to create a new collection on the same cluster.
It is just going to have a different name, `test_collection_import`. We do not need to create a collection first, as it is going to be created automatically.

Restoring collection is also done separately on each node, but our Python SDK does not support it yet. We are going to use the HTTP API instead,
and send a request to each node using `requests` library.

```python
for node_url, snapshot_path in zip(QDRANT_NODES, local_snapshot_paths):
    snapshot_name = os.path.basename(snapshot_path)
    requests.post(
        f"{node_url}/collections/test_collection_import/snapshots/upload?priority=snapshot",
        headers={
            "api-key": QDRANT_API_KEY,
        },
        files={"snapshot": (snapshot_name, open(snapshot_path, "rb"))},
    )
```

Alternatively, you can use the `curl` command:

```bash
curl -X POST 'https://node-0.my-cluster.com:6333/collections/test_collection_import/snapshots/upload?priority=snapshot' \
    -H 'api-key: ${QDRANT_API_KEY}' \
    -H 'Content-Type:multipart/form-data' \
    -F 'snapshot=@node-0-snapshot.snapshot'

curl -X POST 'https://node-1.my-cluster.com:6333/collections/test_collection_import/snapshots/upload?priority=snapshot' \
    -H 'api-key: ${QDRANT_API_KEY}' \
    -H 'Content-Type:multipart/form-data' \
    -F 'snapshot=@node-1-snapshot.snapshot'

curl -X POST 'https://node-2.my-cluster.com:6333/collections/test_collection_import/snapshots/upload?priority=snapshot' \
    -H 'api-key: ${QDRANT_API_KEY}' \
    -H 'Content-Type:multipart/form-data' \
    -F 'snapshot=@node-2-snapshot.snapshot'
```


**Important:** We selected `priority=snapshot` to make sure that the snapshot is preferred over the data stored on the node. You can read mode about the priority in the [documentation](/documentation/operations/snapshots/index.md#snapshot-priority).

Apart from Snapshots, Qdrant also provides the [Qdrant Migration Tool](https://github.com/qdrant/migration) that supports: 
- Migration between Qdrant Cloud instances. 
- Migrating vectors from other providers into Qdrant.
- Migrating from Qdrant OSS to Qdrant Cloud.

Follow our [migration guide](/documentation/tutorials-operations/migration/index.md) to learn how to effectively use the Qdrant Migration tool.