Filtering

Qdrant allows you to set the conditions to be used when searching or retrieving points. You can impose conditions both on the payload and on, for example, the id of the point.

The use of additional conditions is important when, for example, it is impossible to express all the features of the object in the embedding. Examples include a variety of business requirements: stock availability, user location, or desired price range.

Filtering causes

Qdrant allows you to combine conditions in causes. Clauses are different logical operations, such as OR, AND, and NOT. Clauses can be recursively nested into each other so that you can reproduce an arbitrary boolean expression.

Let’s take a look at the clauses implemented in Qdrant.

Suppose we have a set of points with the following payload:

[
    {"id": 1, "city": "London", "color": "green"},
    {"id": 2, "city": "London", "color": "red"},
    {"id": 3, "city": "London", "color": "blue"},
    {"id": 4, "city": "Berlin", "color": "red"},
    {"id": 5, "city": "Moscow", "color": "green"},
    {"id": 6, "city": "Moscow", "color": "blue"}
]

Must

Example:

POST /collections/{collection_name}/points/scroll

{
    "filter": {
        "must": [
            { "key": "city", "match": { "value": "London" } },
            { "key": "color", "match": { "value": "red" } }
        ]
    }
  ...
}
from qdrant_client import QdrantClient
from qdrant_client.http import models

client = QdrantClient(host="localhost", port=6333)

client.scroll(
    collection_name="{collection_name}", 
    scroll_filter=models.Filter(
        must=[
            models.FieldCondition(
                key="city", 
                match=models.MatchValue(value="London"),
            ),
            models.FieldCondition(
                key="color",
                match=models.MatchValue(value="red"),
            ),
        ]
    ),
)

Filtered points would be:

[
    {"id": 2, "city": "London", "color": "red"}
]

When using must, the clause becomes true only if every condition listed inside must is satisfied. In this sense, must is equivalent to the operator AND.

Should

Example:

POST /collections/{collection_name}/points/scroll

{
    "filter": {
        "should": [
            { "key": "city", "match": { "value": "London" } },
            { "key": "color", "match": { "value": "red" } }
        ]
    }
  ...
}
client.scroll(
    collection_name="{collection_name}", 
    scroll_filter=models.Filter(
        should=[
            models.FieldCondition(
                key="city", 
                match=models.MatchValue(value="London"),
            ),
            models.FieldCondition(
                key="color", 
                match=models.MatchValue(value="red"),
            ),
        ]
    ),
)

Filtered points would be:

[
    {"id": 1, "city": "London", "color": "green"},
    {"id": 2, "city": "London", "color": "red"},
    {"id": 3, "city": "London", "color": "blue"},
    {"id": 4, "city": "Berlin", "color": "red"}
]

When using should, the clause becomes true if at least one condition listed inside should is satisfied. In this sense, should is equivalent to the operator OR.

Must Not

Example:

POST /collections/{collection_name}/points/scroll

{
    "filter": {
        "must_not": [
            { "key": "city", "match": { "value": "London" } },
            { "key": "color", "match": { "value": "red" } }
        ]
    }
  ...
}
client.scroll(
    collection_name="{collection_name}", 
    scroll_filter=models.Filter(
        must_not=[
            models.FieldCondition(
                key="city", 
                match=models.MatchValue(value="London")
            ),
            models.FieldCondition(
                key="color", 
                match=models.MatchValue(value="red")
            ),
        ]
    ),
)

Filtered points would be:

[
    {"id": 5, "city": "Moscow", "color": "green"},
    {"id": 6, "city": "Moscow", "color": "blue"}
]

When using must_not, the clause becomes true if none if the conditions listed inside should is satisfied. In this sense, must_not is equivalent to the expression (NOT A) AND (NOT B) AND (NOT C).

Clauses combination

It is also possible to use several clauses simultaneously:

POST /collections/{collection_name}/points/scroll

{
    "filter": {
        "must": [
            { "key": "city", "match": { "value": "London" } }
        ],
        "must_not": [
            { "key": "color", "match": { "value": "red" } }
        ]
    }
  ...
}
client.scroll(
    collection_name="{collection_name}", 
    scroll_filter=models.Filter(
        must=[
            models.FieldCondition(
                key="city", 
                match=models.MatchValue(value="London")
            ),
        ],
        must_not=[
            models.FieldCondition(
                key="color", 
                match=models.MatchValue(value="red")
            ),
        ],
    ),
)

Filtered points would be:

[
    {"id": 1, "city": "London", "color": "green"},
    {"id": 3, "city": "London", "color": "blue"},
]

In this case, the conditions are combined by AND.

Also, the conditions could be recursively nested. Example:

POST /collections/{collection_name}/points/scroll

{
    "filter": {
        "must_not": [
            {
                "must": [
                    { "key": "city", "match": { "value": "London" } },
                    { "key": "color", "match": { "value": "red" } }
                ]
            }
        ]
    }
  ...
}
client.scroll(
    collection_name="{collection_name}", 
    scroll_filter=models.Filter(
        must_not=[
            models.Filter(
                must=[
                    models.FieldCondition(
                        key="city", 
                        match=models.MatchValue(value="London")
                    ),
                    models.FieldCondition(
                        key="color", 
                        match=models.MatchValue(value="red")
                    ),
                ],
            ),
        ],
    ),
)

Filtered points would be:

[
    {"id": 1, "city": "London", "color": "green"},
    {"id": 3, "city": "London", "color": "blue"},
    {"id": 4, "city": "Berlin", "color": "red"},
    {"id": 5, "city": "Moscow", "color": "green"},
    {"id": 6, "city": "Moscow", "color": "blue"}
]

Filtering conditions

Different types of values in payload correspond to different kinds of queries that we can apply to them. Let’s look at the existing condition variants and what types of data they apply to.

Match

{ 
    "key": "color",
    "match": {
        "value": "red" 
    }
}
models.FieldCondition(
    key="color",
    match=models.MatchValue(value="red"),
)

For the other types, the match condition will look exactly the same, except for the type used:

{ 
    "key": "count",
    "match": {
        "value": 0 
    }
}
models.FieldCondition(
    key="count",
    match=models.MatchValue(value=0),
)

The simplest kind of condition is one that checks if the stored value equals the given one. If several values are stored, at least one of them should match the condition. You can apply it to keyword, integer and bool payloads.

Match Any

Available since version 1.1.0

In case you want to check if the stored value is one of multiple values, you can use the Match Any condition. Match Any works as a logical OR for the given values. It can also be described as a IN operator.

You can apply it to keyword and integer payloads.

Example:

{ 
    "key": "color",
    "match": {
        "any": ["black", "yellow"] 
    }
}
FieldCondition(
    key="color",
    match=models.MatchAny(any=["black", "yellow"]),
)

In this example, the condition will be satisfied if the stored value is either black or yellow.

Nested key

Available since version 1.1.0

Payloads being arbitrary JSON object, it is likely that you will need to filter on a nested field.

For convenience, we use a syntax similar to what can be found in the Jq project.

Suppose we have a set of points with the following payload:

[
    {
        "id": 1,
        "country": {
            "name": "Germany",
            "cities": [
                {
                    "name": "Berlin",
                    "population": 3.7,
                    "sightseeing": ["Brandenburg Gate", "Reichstag"]
                },
                {
                    "name": "Munich",
                    "population": 1.5,
                    "sightseeing": ["Marienplatz", "Olympiapark"]
                }
            ]
        }
    },
    {
        "id": 2,
        "country": {
            "name": "Japan",
            "cities": [
                {
                    "name": "Tokyo",
                    "population": 9.3,
                    "sightseeing": ["Tokyo Tower", "Tokyo Skytree"]
                },
                {
                    "name": "Osaka",
                    "population": 2.7,
                    "sightseeing": ["Osaka Castle", "Universal Studios Japan"]
                }
            ]
        }
    }
]

You can search on a nested field using a dot notation.

POST /collections/{collection_name}/points/scroll

{
    "filter": {
        "should": [
            {
                "key": "country.name",
                "match": {
                    "value": "Germany"
                }
            }
        ]
    }
}
client.scroll(
    collection_name="{collection_name}",
    scroll_filter=models.Filter(
        should=[
            models.FieldCondition(
                key="country.name",
                match=models.MatchValue(value="Germany")
            ),
        ],
    ),
)

You can also search through arrays by projecting inner values using the [] syntax.

POST /collections/{collection_name}/points/scroll

{
    "filter": {
        "should": [
            {
                "key": "country.cities[].population",
                "range": {
                    "gte": 9.0,
                }
            }
        ]
    }
}
client.scroll(
    collection_name="{collection_name}",
    scroll_filter=models.Filter(
        should=[
            models.FieldCondition(
                key="country.cities[].population",
                range=models.Range(
                    gt=None,
                    gte=9.0,
                    lt=None,
                    lte=None,
                ),
            ),
        ],
    ),
)

This query would only output the point with id 2 as only Japan has a city with population greater than 9.0.

And the leaf nested field can also be an array.

POST /collections/{collection_name}/points/scroll

{
    "filter": {
        "should": [
            {
                "key": "country.cities[].sightseeing",
                "match": {
                    "value": "Osaka Castle"
                }
            }
        ]
    }
}
client.scroll(
    collection_name="{collection_name}",
    scroll_filter=models.Filter(
        should=[
            models.FieldCondition(
                key="country.cities[].sightseeing",
                match=models.MatchValue(value="Osaka Castle")
            ),
        ],
    ),
)

This query would only output the point with id 2 as only Japan has a city with the “Osaka castke” as part of the sightseeing.

Full Text Match

Available since version 0.10.0

A special case of the match condition is the text match condition. It allows you to search for a specific substring, token or phrase within the text field.

Exact texts that will match the condition depend on full-text index configuration. Configuration is defined during the index creation and describe at full-text index.

If there is no full-text index for the field, the condition will work as exact substring match.

{ 
    "key": "description",
    "match": {
        "text": "good cheap" 
    }
}
models.FieldCondition(
    key="description",
    match=models.MatchText(text="good cheap"),
)

If the query has several words, then the condition will be satisfied only if all of them are present in the text.

Range

{
    "key": "price",
    "range": {
        "gt": null,
        "gte": 100.0,
        "lt": null,
        "lte": 450.0
    }
}
models.FieldCondition(
    key="price",
    range=models.Range(
        gt=None, 
        gte=100.0, 
        lt=None, 
        lte=450.0,
    ),
)

The range condition sets the range of possible values for stored payload values. If several values are stored, at least one of them should match the condition.

Comparisons that can be used:

  • gt - greater than
  • gte - greater than or equal
  • lt - less than
  • lte - less than or equal

Can be applied to float and integer payloads.

Geo

Geo Bounding Box

{
    "key": "location",
    "geo_bounding_box": {
        "bottom_right": {
            "lat": 52.495862,
            "lon": 13.455868
        },
        "top_left": {
            "lat": 52.520711,
            "lon": 13.403683
        }
    }
}
models.FieldCondition(
    key="location",
    geo_bounding_box=models.GeoBoundingBox(
        bottom_right=models.GeoPoint(
            lat=52.495862,
            lon=13.455868,
        ),
        top_left=models.GeoPoint(
            lat=52.520711,
            lon=13.403683,
        ),
    ),
)

It matches with locations inside a rectangle with the coordinates of the upper left corner in bottom_right and the coordinates of the lower right corner in top_left.

Geo Radius

{
    "key": "location",
    "geo_radius": {
        "center": {
            "lat": 52.520711,
            "lon": 13.403683
        },
        "radius": 1000.0
    }
}
models.FieldCondition(
    key="location",
    geo_radius=models.GeoRadius(
        center=models.GeoPoint(
            lat=52.520711,
            lon=13.403683,
        ),
        radius=1000.0,
    ),
)

It matches with locations inside a circle with the center at the center and a radius of radius meters.

If several values are stored, at least one of them should match the condition. These conditions can only be applied to payloads that match the geo-data format.

Values count

In addition to the direct value comparison, it is also possible to filter by the amount of values.

For example, given the data:

[
    {"id": 1, "name": "product A", "comments": ["Very good!", "Excellent"]},
    {"id": 2, "name": "product B", "comments": ["meh", "expected more", "ok"]},
]

We can perform the search only among the items with more than two comments:

{
    "key": "comments",
    "values_count": {
        "gt": 2
    }
}
models.FieldCondition(
    key="comments",
    values_count=models.ValuesCount(gt=2),
)

The result would be:

[
    {"id": 2, "name": "product B", "comments": ["meh", "expected more", "ok"]},
]

If stored value is not an array - it is assumed that the amount of values is equals to 1.

Is Empty

Sometimes it is also useful to filter out records that are missing some value. The IsEmpty condition may help you with that:

{
    "is_empty": {
        "key": "reports"
    }
}
models.IsEmptyCondition(
    is_empty=models.PayloadField(key="reports"),
)

This condition will match all records where the field reports either does not exist, or have NULL or [] value.

Has id

This type of query is not related to payload, but can be very useful in some situations. For example, the user could mark some specific search results as irrelevant, or we want to search only among the specified points.

POST /collections/{collection_name}/points/scroll

{
    "filter": {
        "must": [
            { "has_id": [1,3,5,7,9,11] }
        ]
    }
  ...
}
client.scroll(
    collection_name="{collection_name}", 
    scroll_filter=models.Filter(
        must=[
            models.HasIdCondition(has_id=[1, 3, 5, 7, 9, 11]),
        ],
    ),
)

Filtered points would be:

[
    {"id": 1, "city": "London", "color": "green"},
    {"id": 3, "city": "London", "color": "blue"},
    {"id": 5, "city": "Moscow", "color": "green"},
]