Nomic
The nomic-embed-text-v1
model is an open source 8192 context length text encoder.
While you can find it on the Hugging Face Hub,
you may find it easier to obtain them through the Nomic Text Embeddings.
Once installed, you can configure it with the official Python client, FastEmbed or through direct HTTP requests.
You can use Nomic embeddings directly in Qdrant client calls. There is a difference in the way the embeddings are obtained for documents and queries.
Upsert using Nomic SDK
The task_type
parameter defines the embeddings that you get.
For documents, set the task_type
to search_document
:
from qdrant_client import QdrantClient, models
from nomic import embed
output = embed.text(
texts=["Qdrant is the best vector database!"],
model="nomic-embed-text-v1",
task_type="search_document",
)
client = QdrantClient()
client.upsert(
collection_name="my-collection",
points=models.Batch(
ids=[1],
vectors=output["embeddings"],
),
)
Upsert using FastEmbed
from fastembed import TextEmbedding
from client import QdrantClient, models
model = TextEmbedding("nomic-ai/nomic-embed-text-v1")
output = model.embed(["Qdrant is the best vector database!"])
client = QdrantClient()
client.upsert(
collection_name="my-collection",
points=models.Batch(
ids=[1],
vectors=[embeddings.tolist() for embeddings in output],
),
)
Search using Nomic SDK
To query the collection, set the task_type
to search_query
:
output = embed.text(
texts=["What is the best vector database?"],
model="nomic-embed-text-v1",
task_type="search_query",
)
client.search(
collection_name="my-collection",
query_vector=output["embeddings"][0],
)
Search using FastEmbed
output = next(model.embed("What is the best vector database?"))
client.search(
collection_name="my-collection",
query_vector=output.tolist(),
)
For more information, see the Nomic documentation on Text embeddings.