Building a Facial Recognition System with Qdrant
David Myriel
·December 03, 2024
On this page:
The Twin Celebrity App
In the era of personalization, combining cutting-edge technology with fun can create engaging applications that resonate with users. One such project is the Twin Celebrity app, a tool that matches users with their celebrity look-alikes using facial recognition embeddings and vector search powered by Qdrant. This blog post dives into the architecture, tools, and practical advice for developers who want to build this app—or something similar.
The Twin Celebrity app identifies which celebrity a user resembles by analyzing a selfie. The app utilizes:
- Face recognition embeddings: Generated by a ResNet-based FaceNet model.
- Vector similarity search: Powered by Qdrant to find the closest match.
- ZenML: For orchestrating data pipelines.
- Streamlit: As the front-end interface.
This project not only demonstrates the capabilities of modern vector databases but also serves as an exciting introduction to embedding-based applications.
Learn From the App’s Creator
We interviewed the engineer behind this project, Miguel Otero Pedrido, who is also the founder of The Neural Maze. Miguel explains in detail how he put the app together, as well as his choice of tools.
Turns out his celebrity twin is…Andy Samberg
Architecture
Search Engine & DB: Qdrant stands out as a high-performance vector database built in Rust, known for its reliability and speed. Its advanced features, such as vector visualization and efficient querying, make it a go-to choice for developers working on embedding-based projects.
ML Framework: ZenML simplifies pipeline creation with a modular, cloud-agnostic framework that ensures clean, scalable, and portable code, ideal for cross-platform workflows.
Facial Recognition: MTCNN ensures consistent face alignment, making the embeddings more reliable.
Embedding Model: FaceNet provides lightweight, pre-trained facial embeddings, balancing accuracy and efficiency, making it perfect for tasks like the Twin Celebrity app.
Frontend: Streamlit streamlines UI development, enabling rapid prototyping with minimal effort, allowing developers to focus on core functionalities.
Application Workflows
The app is divided into two phases - The Offline Phase, where the celebrity images are vectorized and The Online Phase, which carries out a live similarity search.
The Offline Phase
The first step is dataset preparation. Celebrity images are fetched from HuggingFace’s dataset library to serve as the foundation for embeddings.
Next - MTCNN aligns celebrities faces within images. Then, a pre-trained FaceNet model is used to generate 512-dimensional embeddings for each image. This ensures consistent and high-quality representation of facial features.
Finally, these embeddings, along with metadata, are stored in Qdrant Cloud. This enables efficient retrieval and management of the data for later use.
The Online Phase
In the online phase, user interaction begins with a Streamlit app. The app captures a selfie and converts it into an embedding using the same FaceNet model.
The generated embedding is then queried against Qdrant, which retrieves the top matches based on similarity.
Finally, the results are displayed in an intuitive interface, showing the user their closest celebrity match and making the interaction engaging and seamless.
How to Build the App
Miguel recently published a video on his YouTube channel: The Neural Maze.
For detailed steps to build the app, watch Building a Twin Celebrity App.
1. Set Up the Offline Pipeline
Using ZenML, the pipeline consists of:
- Data Loading: Fetch images and labels (e.g., “Brad Pitt”) from Hugging Face.
- Sampling: Reduce dataset size for faster processing, selecting around 3,000 images.
- Embedding Generation: Convert images into embeddings using MTCNN for face detection and FaceNet for embedding creation.
- Storage in Qdrant: Save embeddings into a collection named
celebrities
.
2. Create the Online Application
The Streamlit app handles:
- Image Capture: Takes a selfie through a webcam or uploaded file.
- Embedding Querying: Sends the embedding to Qdrant, retrieves the top matches, and visualizes the similarity.
3. Deployment Options
Deploy the app on platforms like Google Cloud, AWS, or Azure. Setting up CI/CD pipelines can streamline updates and deployments.
The application can be containerized using Docker. For hosting, Google Cloud Run is an excellent choice, as it efficiently manages containerized applications without requiring extensive infrastructure management.
The deployment process is streamlined further with CI/CD pipelines, such as those provided by Cloud Build or GitHub Actions, which automate the steps for building, testing, and deploying updates.
4. Test the Quality of Your Embeddings
You can always use Qdrant’s visualization tools to refine accuracy and ensure clusters align with expectations.
If your data is properly embedded, then the visualization tool will appropriately cluster celebrity images into groups.
Lessons and Takeaways
Scalability poses challenges when working with large datasets, such as 20,000+ images. Consider optimizations like quantization to reduce memory usage or precomputing average embeddings for clusters can significantly minimize storage and computational costs. These strategies ensure the system remains performant as the dataset grows.
The potential real-world applications of this technology extend far beyond entertainment. Similar systems can be used in security applications for embedding-based facial recognition to secure access to buildings or devices.
In healthcare, they can assist in analyzing features such as moles or skin textures. In retail, they enable personalized recommendations based on user photos, demonstrating the versatility of this approach.
Next Steps for Developers
Start by cloning the project repository to understand the architecture and functionality.
Expand the dataset with more celebrity images for diversity or fine-tune the FaceNet model for improved accuracy.
Consider deploying a mobile-friendly version using frameworks like Flutter or React Native for a seamless user experience.
For scalability, implement multi-GPU setups to speed up embedding generation and optimize storage with techniques like quantization or average embeddings.
To enhance functionality, explore features like video input for real-time matches or add metadata such as celebrity bios to enrich user interaction. Experiment with custom similarity scoring for more tailored results.
More Links
- Miguel’s LinkedIn profile
- Miguel’s Substack blog
- The Neural Maze YouTube channel
- Twin Celebrity GitHub Repository