论文标题
MANU:云本地矢量数据库管理系统
Manu: A Cloud Native Vector Database Management System
论文作者
论文摘要
随着基于学习的嵌入模型的开发,嵌入向量被广泛用于分析和搜索非结构化数据。随着矢量收集超过十亿个尺度,需要完全管理和水平可扩展的向量数据库。在过去的三年中,通过与我们的1200多个行业用户的互动,我们对下一代矢量数据库应具有的功能进行了概述,其中包括长期可转化性,可调节性一致性,良好的弹性和高性能。我们提出Manu,这是一个实现这些功能的云本机向量数据库。如果我们遵循传统的DBMS设计规则,就很难整合所有这些功能。由于大多数矢量数据应用程序不需要复杂的数据模型和强大的数据一致性,因此我们的设计理念是放宽数据模型和一致性约束,以换取上述功能。具体而言,Manu首先将书面日志(WAL)和Binlog作为骨干服务公开。其次,写入组件被设计为日志发布者,而所有读取的分析和搜索组件都被设计为对日志服务的独立订阅者。最后,我们利用多反转并发控制(MVCC)和增量一致性模型来简化系统组件之间的通信和合作。这些设计在系统组件之间达到了低耦合,这对于弹性和进化至关重要。我们还通过硬件感知的实现和对复杂搜索语义的支持,广泛优化了MANU,以促进性能和可用性。
With the development of learning-based embedding models, embedding vectors are widely used for analyzing and searching unstructured data. As vector collections exceed billion-scale, fully managed and horizontally scalable vector databases are necessary. In the past three years, through interaction with our 1200+ industry users, we have sketched a vision for the features that next-generation vector databases should have, which include long-term evolvability, tunable consistency, good elasticity, and high performance. We present Manu, a cloud native vector database that implements these features. It is difficult to integrate all these features if we follow traditional DBMS design rules. As most vector data applications do not require complex data models and strong data consistency, our design philosophy is to relax the data model and consistency constraints in exchange for the aforementioned features. Specifically, Manu firstly exposes the write-ahead log (WAL) and binlog as backbone services. Secondly, write components are designed as log publishers while all read-only analytic and search components are designed as independent subscribers to the log services. Finally, we utilize multi-version concurrency control (MVCC) and a delta consistency model to simplify the communication and cooperation among the system components. These designs achieve a low coupling among the system components, which is essential for elasticity and evolution. We also extensively optimize Manu for performance and usability with hardware-aware implementations and support for complex search semantics.