Powering Bloop semantic code search

Powering Bloop semantic code search

Founded in early 2021, bloop was one of the first companies to tackle semantic search for codebases. A fast, reliable Vector Search Database is a core component of a semantic search engine, and bloop surveyed the field of available solutions and even considered building their own. They found Qdrant to be the top contender and now use it in production.

This document is intended as a guide for people who intend to introduce semantic search to a novel field and want to find out if Qdrant is a good solution for their use case.

About bloop

bloop is a fast code-search engine that combines semantic search, regex search and precise code navigation into a single lightweight desktop application that can be run locally. It helps developers understand and navigate large codebases, enabling them to discover internal libraries, reuse code and avoid dependency bloat. bloop’s chat interface explains complex concepts in simple language so that engineers can spend less time crawling through code to understand what it does, and more time shipping features and fixing bugs.

bloop’s mission is to make software engineers autonomous and semantic code search is the cornerstone of that vision. The project is maintained by a group of Rust and Typescript engineers and ML researchers. It leverages many prominent nascent technologies, such as Tauri, tantivy, Qdrant and Anthropic.

About Qdrant

Qdrant is an open-source Vector Search Database written in Rust . It deploys as an API service providing a search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and many more solutions to make the most of unstructured data. It is easy to use, deploy and scale, blazing fast and is accurate simultaneously.

Qdrant was founded in 2021 in Berlin by Andre Zayarni and Andrey Vasnestov with the mission to power the next generation of AI applications with advanced and high-performant vector similarity search technology. Their flagship product is the vector search database which is available as an open source https://github.com/qdrant/qdrant or managed cloud solution https://cloud.qdrant.io/.

The Problem

Firstly, what is semantic search? It’s finding relevant information by comparing meaning, rather than simply measuring the textual overlap between queries and documents. We compare meaning by comparing embeddings - these are vector representations of text that are generated by a neural network. Each document’s embedding denotes a position in a latent space, so to search you embed the query and find its nearest document vectors in that space.

Why is semantic search so useful for code? As engineers, we often don’t know - or forget - the precise terms needed to find what we’re looking for. Semantic search enables us to find things without knowing the exact terminology. For example, if an engineer wanted to understand “What library is used for payment processing?” a semantic code search engine would be able to retrieve results containing “Stripe” or “PayPal”. A traditional lexical search engine would not.

One peculiarity of this problem is that the usefulness of the solution increases with the size of the code base – if you only have one code file, you’ll be able to search it quickly, but you’ll easily get lost in thousands, let alone millions of lines of code. Once a codebase reaches a certain size, it is no longer possible for a single engineer to have read every single line, and so navigating large codebases becomes extremely cumbersome.

In software engineering, we’re always dealing with complexity. Programming languages, frameworks and tools have been developed that allow us to modularize, abstract and compile code into libraries for reuse. Yet we still hit limits: Abstractions are still leaky, and while there have been great advances in reducing incidental complexity, there is still plenty of intrinsic complexity1 in the problems we solve, and with software eating the world, the growth of complexity to tackle has outrun our ability to contain it. Semantic code search helps us navigate these inevitably complex systems.

But semantic search shouldn’t come at the cost of speed. Search should still feel instantaneous, even when searching a codebase as large as Rust (which has over 2.8 million lines of code!). Qdrant gives bloop excellent semantic search performance whilst using a reasonable amount of resources, so they can handle concurrent search requests.

The Upshot

bloop are really happy with how Qdrant has slotted into their semantic code search engine: it’s performant and reliable, even for large codebases. And it’s written in Rust(!) with an easy to integrate qdrant-client crate. In short, Qdrant has helped keep bloop’s code search fast, accurate and reliable.

Footnotes:


  1. Incidental complexity is the sort of complexity arising from weaknesses in our processes and tools, whereas intrinsic complexity is the sort that we face when trying to describe, let alone solve the problem. ↩︎