Introducing vector database capabilities in Azure Cosmos DB for NoSQL (Public Preview) - Azure Cosmos DB Blog (2024)

We are excited to announce that native vector indexing and search in Azure Cosmos DB for NoSQL is now available in preview! Azure Cosmos DB is the world’s first full-featured serverless database with vector search and features multiple vector index options from flat (exact), quantized flat, and a new DiskANN-based index. DiskANN is a suite of highly scalable, accurate, and cost-effective approximate nearest neighbor (ANN) algorithms, developed at Microsoft Research, for low-latency and cost-effective vector search at any scale.

You can take advantage of Azure Cosmos DB’s rich features such as a NoSQL query syntax to combine vector search with query filters that can increase the relevancy and accuracy of your vectors searches. You’ll also get all the benefits of Azure Cosmos DB’s flexibility, instant autoscale, 99.999% SLA, geo-replication, and more! Store your data and vectors together, eliminating the need to store vectors in a separate vector database and realizeimproved consistency, synchronization between vectors and data, and a reduction in the complexity and costs of AI applications.

What is DiskANN?

DiskANN is asuite of scalable approximate nearest neighbor search algorithms designed for efficient vector search at any scale. It offers high recall, high queries per second (QPS), andlow query latency even for billion-point datasets. This makes it it a powerful tool for handling large volumes of data. Learn more about DiskANN from Microsoft.

  • DiskANN is a graph-based indexing and search system that performs fast and accurate approximate nearest neighbor (ANN) search at any-scale.
  • It primarily uses an SSD-based index to scale to an order of magnitude more points compared to in-memory indices, while still retaining high QPS and low latency.
  • Quantized (compressed) vectors are kept in memory, and DiskANN balances interactions between the two to offer low latency and high accuracy.
  • DiskANN is based on a novel graph index called Vamana that is more versatile than existing graph indices by maintaining accuracy despite many insertions, modifications, and deletions, without the need for expensive index rebuilds.

The DiskANN Advantage

Scalability

  • DiskANN vector indexes are stored on high-speed SSDs, while compressed vectors are stored in memory. This reduces memory-footprint of the vector index, enabling planet-sized scalability for vector search scenarios.

Low Latency

  • The DiskANN graph index construction makes it very efficient during search, minimizing the number of SSD reads to achieve high throughput and low latency.

High Accuracy

  • During index construction, nodes in the graph are connected to diverse neighbors to improve recall. After the search operation, the results are re-ranked using the full-precision vectors providing high accuracy.

Low Cost

  • Because the quantized vectors are stored in memory and the full-precision graph is stored on SSDs, it’s much less expensive to maintain and operate DiskANN-based indexes. This results in lower RU costs for your vector search queries.

Robust to Insertions, Deletions, and Modifications

  • The DiskANN graph index is capable of supporting transactional workloads and does not degrade over time with high volumes of inserts, updates, or deletes. This is a differentiator among typical vector databases in the market today, which are built using HNSW and other less robust methods that require computationally expensive full index rebuilds to maintain accuracy.

The benefits of DiskANN, combined with the instant & dynamic autoscale, global replication, and industry leading 99.999% SLA of Azure Cosmos DB make for an unparalleled database for managing both your operational and vector data workloads.

What vector index options are available?

There are multiple types of vector index policies that can be defined for a Cosmos DB collection. Learn more about vector indexing in Azure Cosmos DB

This table provides a good guide for the different index types and their strengths:

Enroll in the Vector Search Preview

Vector search in Azure Cosmos DB for NoSQL is a preview feature and requires enrollment via the Features page of your Azure Cosmos DB resource . Follow the below steps to register:

1. Navigate to your Azure Cosmos DB for NoSQL resource page.

2. Select the “Features” pane under the “Settings” menu.

3. Select “Vector Search in Azure Cosmos DB for NoSQL”.

4. Read the description of the feature and confirm you want to enroll in the preview.

5. Select “Enable” to enroll in the preview.

Next Steps

The integration of vector search capabilities into Azure Cosmos DB for NoSQL marks a significant advancement in database technology, offering unparalleled scalability, efficiency, and accuracy. With the introduction of DiskANN and other vector indexing options, Azure Cosmos DB provides robust solutions for managing large-scale vector data alongside your operational data. Enroll in the Vector Search Preview today and explore the future of AI-driven applications with the powerful features of Azure Cosmos DB.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB and Apache Cassandra.Try Azure Cosmos DB for free here. To stay in the loop on Azure Cosmos DB updates, follow us onX,YouTube, andLinkedIn.

To quickly build your first database, watch ourGet Started videoson YouTube and explore ways todev/test free.

Introducing vector database capabilities in Azure Cosmos DB for NoSQL (Public Preview) - Azure Cosmos DB Blog (2024)

FAQs

Is Cosmos DB a vector database? ›

Azure Cosmos DB for NoSQL, known for its globally distributed, multi-model database capabilities, has now positioned itself as a formidable vector database (VectorDB) through its support for vector embeddings and similarity searches.

What is the difference between NoSQL and vector database? ›

NoSQL databases are often used in big data and real-time web applications. Vector databases are designed to handle vector data, which are mathematical constructs that have both magnitude and direction. They are used in fields such as physics, engineering, and computer graphics.

Does Azure Cosmos DB support NoSQL? ›

Azure Cosmos DB is a database platform that offers distributed data APIs in both NoSQL and relational variants. Specifically, many of the NoSQL APIs offer various consistency options that allow you to fine tune the level of consistency or availability that meets your real-world application requirements.

Does Azure support vector databases? ›

For example, Azure OpenAI Service helps you create vectors for your data and input queries for vector similarity search. Azure Cosmos DB for MongoDB (vCore) is supported as a data source for Azure OpenAI on Your Data.

Which is an example of vector database? ›

Chroma. Chroma DB is an open-source vector database tailored for AI-native embedding. It simplifies the creation of Large Language Model (LLM) applications powered by natural language processing. Chroma excels in providing a feature-rich environment with capabilities like queries, filtering, density estimates, and more ...

What is the difference between Azure DB and Cosmos DB? ›

Azure SQL is based on SQL Server engine, you can easily migrate applications and continue to use the tools, languages, and resources that you're familiar with. Azure Cosmos DB is used for web, mobile, gaming, and IoT application that needs to handle massive amounts of data, reads, and writes at a global scale.

What are the advantages of a vector database? ›

A good vector database provides applications with a foundation through features like data management, fault tolerance, critical security features, and a query engine. These capabilities allow users to operationalize their workloads to simplify scaling, maintain high scalability, and support security requirements.

What are the four 4 different types of NoSQL databases? ›

In crux, we can say that there are four types of NoSQL Databases: Key-Value (KV) Stores, Document Stores, Column Family Data stores, and Graph Databases.

Which database is best in NoSQL? ›

  • Cassandra. Apache Cassandra is a highly scalable and fault-tolerant NoSQL database designed to handle massive amounts of data spread across many commodity servers. ...
  • ScyllaDB. ...
  • MarkLogic Server. ...
  • Couchbase Server. ...
  • IBM Cloudant. ...
  • Raven DB. ...
  • Azure Cosmos DB. ...
  • Astra DB.
Mar 26, 2024

What is Azure Cosmos DB capabilities? ›

Azure Cosmos DB offers a solution for modern apps and intelligent workloads by being very responsive with dynamic and elastic autoscale. It is available in every Azure region and can automatically replicate data closer to users. It has SLA guaranteed low-latency and high availability.

Why is Cosmos DB so expensive? ›

Azure Cosmos DB can be costly due to its robust features, global distribution, and low latency performance. Factors like provisioned throughput, storage, and data transfer contribute to expenses. Users pay for the scalability, flexibility, and advanced capabilities provided by the service.

When to use Azure Cosmos DB? ›

Azure Cosmos DB is a global distributed, multi-model database that is used in a wide range of applications and use cases. It is a good choice for any serverless application that needs low order-of-millisecond response times, and needs to scale rapidly and globally.

Is Azure Cosmos DB vector DB? ›

Azure Cosmos DB for NoSQL now offers vector indexing and search in preview. This feature is designed to handle high-dimensional vectors, enabling efficient and accurate vector search at any scale. You can now store vectors directly in the documents alongside your data.

Is vector database NoSQL? ›

There are two common types of vector database implementations - pure vector database and integrated vector database in a NoSQL or relational database.

Is Azure Cognitive Search a vector DB? ›

Vector database.

Azure AI Search stores the data that you query over. Use it as a pure vector store any time you need long-term memory or a knowledge base, or grounding data for Retrieval Augmented Generation (RAG) architecture, or any app that uses vectors.

What type of data is Cosmos DB? ›

Azure Cosmos DB is a fully managed NoSQL, relational, and vector database.

What is the difference between cosmos and Neo4j? ›

Azure Cosmos DB supports multiple data models, including key-value, document, column-family, and graph models. In contrast, Neo4j exclusively focuses on the graph data model, making it ideal for complex connections and relationships between data.

Is MongoDB a vector database? ›

MongoDB Atlas voted most loved vector database

Once again, MongoDB Atlas takes the prize as the most loved vector database according to the new 2024 State of AI report from Retool.

Is Azure Cosmos DB a graph DB? ›

Azure Cosmos DB provides you with a fully-managed graph database service with global distribution, elastic scaling of storage and throughput, automatic indexing and query, tunable consistency levels, and supports the Gremlin standard.

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Dong Thiel

Last Updated:

Views: 5392

Rating: 4.9 / 5 (59 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Dong Thiel

Birthday: 2001-07-14

Address: 2865 Kasha Unions, West Corrinne, AK 05708-1071

Phone: +3512198379449

Job: Design Planner

Hobby: Graffiti, Foreign language learning, Gambling, Metalworking, Rowing, Sculling, Sewing

Introduction: My name is Dong Thiel, I am a brainy, happy, tasty, lively, splendid, talented, cooperative person who loves writing and wants to share my knowledge and understanding with you.