论文标题

维修速率下限分布式存储

Repair rate lower bounds for distributed storage

论文作者

Luby, Michael

论文摘要

分布式存储系统的主要目标之一是使用大量$ n $ n $不可靠的存储节点可靠地存储大量$ dsize $ size $ dsize $ dsize $ dsize $ dsize $。存储开销$β$是$ dsize $以外的系统容量的比例,即$β= 1- \ frac {dsize} {n \ cdot nsize} $。存储节点随着时间的流逝而随机失败,并用最初的空节点替换,因此数据以平均速率$ erate =λ\ cdot n \ cdot nsize $从系统中删除,其中$ 1/λ$是失败前节点的平均寿命。为了维持源数据的可恢复性,维修器以某种平均速率$ rrate $从节点的网络不断读取数据,并根据读取数据生成并将数据写入节点。主要结果是,对于任何维修器,如果源数据在每个时间点都可以恢复,那么$ rrate \ ge \ frac {erate} {2 \ cdotβ} $渐近地是$ n $,而$ n $则为beta,并且beta为零。这种不平等为任何维修人员所需的平均速率提供了一个基本的下限,以维持系统数据的可恢复性。

One of the primary objectives of a distributed storage system is to reliably store a large amount $dsize$ of source data for a long duration using a large number $N$ of unreliable storage nodes, each with capacity $nsize$. The storage overhead $β$ is the fraction of system capacity available beyond $dsize$, i.e., $β= 1- \frac{dsize}{N \cdot nsize}$. Storage nodes fail randomly over time and are replaced with initially empty nodes, and thus data is erased from the system at an average rate $erate = λ\cdot N \cdot nsize$, where $1/λ$ is the average lifetime of a node before failure. To maintain recoverability of the source data, a repairer continually reads data over a network from nodes at some average rate $rrate$, and generates and writes data to nodes based on the read data. The main result is that, for any repairer, if the source data is recoverable at each point in time then it must be the case that $rrate \ge \frac{erate}{2 \cdot β}$ asymptotically as $N$ goes to infinity and beta goes to zero. This inequality provides a fundamental lower bound on the average rate that any repairer needs to read data from the system in order to maintain recoverability of the source data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源