Create a Neural Search Service with Fastembed

Time: 20 minLevel: BeginnerOutput: GitHub

This tutorial shows you how to build and deploy your own neural search service to look through descriptions of companies from and pick the most similar ones to your query. The website contains the company names, descriptions, locations, and a picture for each entry.

Alternatively, you can use datasources such as Crunchbase, but that would require obtaining an API key from them.

Our neural search service will use Fastembed package to generate embeddings of text descriptions and FastAPI to serve the search API. Fastembed natively integrates with Qdrant client, so you can easily upload the data into Qdrant and perform search queries.


To create a neural search service, you will need to transform your raw data and then create a search function to manipulate it. First, you will 1) download and prepare a sample dataset using a modified version of the BERT ML model. Then, you will 2) load the data into Qdrant, 3) create a neural search API and 4) serve it using FastAPI.

Neural Search Workflow

Note: The code for this tutorial can be found here: Step 2: Full Code for Neural Search.


To complete this tutorial, you will need:

  • Docker - The easiest way to use Qdrant is to run a pre-built Docker image.
  • Raw parsed data from
  • Python version >=3.8

Prepare sample dataset

To conduct a neural search on startup descriptions, you must first encode the description data into vectors. Fastembed integration into qdrant client combines encoding and uploading into a single step.

It also takes care of batching and parallelization, so you don’t have to worry about it.

Let’s start by downloading the data and installing the necessary packages.

  1. First you need to download the dataset.

Run Qdrant in Docker

Next, you need to manage all of your data using a vector engine. Qdrant lets you store, update or delete created vectors. Most importantly, it lets you search for the nearest vectors via a convenient API.

Note: Before you begin, create a project directory and a virtual python environment in it.

  1. Download the Qdrant image from DockerHub.
docker pull qdrant/qdrant
  1. Start Qdrant inside of Docker.
docker run -p 6333:6333 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \

You should see output like this

[2021-02-05T00:08:51Z INFO  actix_server::builder] Starting 12 workers
[2021-02-05T00:08:51Z INFO  actix_server::builder] Starting "actix-web-service-" service on

Test the service by going to http://localhost:6333/. You should see the Qdrant version info in your browser.

All data uploaded to Qdrant is saved inside the ./qdrant_storage directory and will be persisted even if you recreate the container.

Upload data to Qdrant

  1. Install the official Python client to best interact with Qdrant.
pip install qdrant-client[fastembed]

Note, that you need to install the fastembed extra to enable Fastembed integration. At this point, you should have startup records in the startups_demo.json file and Qdrant running on a local machine.

Now you need to write a script to upload all startup data and vectors into the search engine.

  1. Create a client object for Qdrant.
# Import client library
from qdrant_client import QdrantClient

qdrant_client = QdrantClient("http://localhost:6333")
  1. Select model to encode your data.

You will be using a pre-trained model called sentence-transformers/all-MiniLM-L6-v2.

  1. Related vectors need to be added to a collection. Create a new collection for your startup vectors.

Note, that we use get_fastembed_vector_params to get the vector size and distance function from the model. This method automatically generates configuration, compatible with the model you are using. Without fastembed integration, you would need to specify the vector size and distance function manually. Read more about it here.

Additionally, you can specify extended configuration for our vectors, like quantization_config or hnsw_config.

  1. Read data from the file.
payload_path = os.path.join(DATA_DIR, "startups_demo.json")
metadata = []
documents = []

with open(payload_path) as fd:
    for line in fd:
        obj = json.loads(line)

In this block of code, we read data we read data from startups_demo.json file and split it into 2 lists: documents and metadata. Documents are the raw text descriptions of startups. Metadata is the payload associated with each startup, such as the name, location, and picture. We will use documents to encode the data into vectors.

  1. Encode and upload data.
    parallel=0,  # Use all available CPU cores to encode data

The add method will encode all documents and upload them to Qdrant. This is one of two fastembed-specific methods, that combines encoding and uploading into a single step.

The parallel parameter controls the number of CPU cores used to encode data.

Additionally, you can specify ids for each document, if you want to use them later to update or delete documents. If you don’t specify ids, they will be generated automatically and returned as a result of the add method.

You can monitor the progress of the encoding by passing tqdm progress bar to the add method.

from tqdm import tqdm


Note: See the full code for this step here.

Build the search API

Now that all the preparations are complete, let’s start building a neural search class.

In order to process incoming requests, neural search will need 2 things: 1) a model to convert the query into a vector and 2) the Qdrant client to perform search queries. Fastembed integration into qdrant client combines encoding and uploading into a single method call.

  1. Create a file named and specify the following.
from qdrant_client import QdrantClient

class NeuralSearcher:
    def __init__(self, collection_name):
        self.collection_name = collection_name
        # initialize Qdrant client
        self.qdrant_client = QdrantClient("http://localhost:6333")
  1. Write the search function.
def search(self, text: str):
        search_result = self.qdrant_client.query(
            query_filter=None,  # If you don't want any filters for now
            limit=5  # 5 the most closest results is enough
        # `search_result` contains found vector ids with similarity scores along with the stored payload
        # In this function you are interested in payload only
        metadata = [hit.metadata for hit in search_result]
        return metadata
  1. Add search filters.

With Qdrant it is also feasible to add some conditions to the search. For example, if you wanted to search for startups in a certain city, the search query could look like this:

from qdrant_client.models import Filter


    city_of_interest = "Berlin"

    # Define a filter for cities
    city_filter = Filter(**{
        "must": [{
            "key": "city", # Store city information in a field of the same name 
            "match": { # This condition checks if payload field has the requested value
                "value": "city_of_interest"

    search_result = self.qdrant_client.query(

You have now created a class for neural search queries. Now wrap it up into a service.

Deploy the search with FastAPI

To build the service you will use the FastAPI framework.

  1. Install FastAPI.

To install it, use the command

pip install fastapi uvicorn
  1. Implement the service.

Create a file named and specify the following.

The service will have only one API endpoint and will look like this:

from fastapi import FastAPI

# The file where NeuralSearcher is stored
from neural_searcher import NeuralSearcher

app = FastAPI()

# Create a neural searcher instance
neural_searcher = NeuralSearcher(collection_name='startups')

def search_startup(q: str):
    return {

if __name__ == "__main__":
    import uvicorn, host="", port=8000)
  1. Run the service.
  1. Open your browser at http://localhost:8000/docs.

You should be able to see a debug interface for your service.

FastAPI Swagger interface

Feel free to play around with it, make queries regarding the companies in our corpus, and check out the results.

Next steps

The code from this tutorial has been used to develop a live online demo. You can try it to get an intuition for cases when the neural search is useful. The demo contains a switch that selects between neural and full-text searches. You can turn the neural search on and off to compare your result with a regular full-text search.

Note: The code for this tutorial can be found here: Full Code for Neural Search.

Join our Discord community, where we talk about vector search and similarity learning, publish other examples of neural networks and neural search applications.