0

Metadata automation and optimization - Reece Griffiths | Vector Space Talks

Sabrina Aquino

·

February 24, 2025

Metadata automation and optimization - Reece Griffiths | Vector Space Talks

“Metadata is one of the key unlocks to both segmentation and file organization, setting up the right knowledge base, and enriching it to hit that last mile of accuracy and speed.”
— Reece Griffiths

Reece Griffiths is the CEO and co-founder of Deasy Labs, a metadata automation platform that helps companies optimize their vector databases for retrieval accuracy. Previously part of Y Combinator, Deasy Labs focuses on improving metadata extraction, classification, and enrichment at scale.

Top takeaways:

Retrieval-augmented generation (RAG) and vector search are incomplete without high-quality metadata. In this episode of Vector Space Talks, Reece Griffiths explains how metadata automation and optimization can significantly enhance retrieval accuracy, filtering, and indexing efficiency.

Here are some key insights from this episode:

  1. Why Metadata Matters in Vector Search: Traditional approaches often focus on embedding models, but metadata can bridge the gap between mediocre and high-performance search systems.
  2. Metadata for Segmentation vs. Enrichment: Segmentation metadata helps filter and categorize data, while enrichment metadata provides additional context that improves retrieval accuracy.
  3. Optimizing Hybrid Search with Metadata: Reece explains how metadata can be embedded into sparse vectors for hybrid search, enhancing keyword and semantic search combinations.
  4. Scaling Metadata Extraction: Learn how Deasy Labs uses LLM-powered extraction methods to generate metadata dynamically and update taxonomies in real-time.
  5. Metadata as an Access Control Layer: Metadata can also be leveraged for role-based access control (RBAC) by defining data slices that different teams or users can access within a knowledge base.

Fun Fact: Reece and his team at Deasy Labs experimented with pure metadata embeddings (without the original data) and found that hybrid search using metadata alone can yield strong retrieval performance.

Show notes:

00:00 Introduction to metadata automation and optimization.
05:32 The role of metadata in retrieval-augmented generation (RAG).
10:48 How Deasy Labs structures metadata extraction workflows.
15:35 Implementing hybrid search with sparse metadata vectors.
20:14 Automating metadata classification using LLMs.
25:51 Best practices for maintaining metadata over time.
30:18 Using metadata for segmentation and access control.
35:43 Q&A and closing remarks.

More Quotes from Reece:

“Going from 75% retrieval accuracy to 95%+ is hard. In many cases, 80% accuracy might as well be zero. Metadata is the key to getting that last mile.”
— Reece Griffiths

“Metadata shouldn’t rely on manual tagging by business teams. With LLMs, we can auto-suggest domain-specific metadata dynamically and refine it over time.”
— Reece Griffiths

“In a vector database, segmentation metadata helps you structure your knowledge base, while enrichment metadata boosts retrieval precision—both are critical.”
— Reece Griffiths


Try Deasy Labs 🚀

Want to enhance your vector search performance with automated metadata workflows?

Start now at app.deasylabs.com!


Get Started with Qdrant Free

Get Started