论文标题
批处理的前身,并在外部内存中使用尺寸价格的信息进行排序
Batched Predecessor and Sorting with Size-Priced Information in External Memory
论文作者
论文摘要
在单位成本比较模型中,黑匣子采用输入两个项目并输出比较结果。在该模型中已经研究了诸如分类和搜索之类的问题,并且已概括为包括价格信息的概念,其中不同的项目对(例如数据库记录)具有不同的比较成本。这些比较成本可以是任意的(在这种情况下,没有算法可以接近最佳(Charikar等人Stoc 2000)),结构化(例如,比较成本可能取决于数据库的长度(Gupta等人的pocs 2001))或随机性(Angelov等人(Angelov等人)(Angelov等人,Latin,Latin,2008)。由数据库设置的激励,成本取决于项目的大小,我们考虑了分类和批处理的前身的问题,其中将两套不均匀的项目$ a $ a $和$ b $作为输入提供。 (1)在RAM设置中,我们考虑两组都有$ n $键的场景。比较$ a $ a $ a $的两个项目的费用,将$ a $与$ b $的项目进行比较是$ b $,并且比较$ b $中的两个项目是$ c $。我们给出了CASE $ A \ le B \ le C $的上限和下限。请注意,$ b = 1,a = c = \ infty $是著名的``坚果和螺栓''问题。 (2)在磁盘访问模型(DAM)中,其中磁盘和内部内存之间的元素是主要的瓶颈,我们考虑了$ b $中元素的元素大于$ a $中的元素的情况。较大的项目需要更多的I/OS来记忆,在内部内存中消耗更多的空间,并且需要整体进行比较。 我们首先在批处理的前身问题上给出了对输出敏感的下限和上限,并使用它们来得出两个模型中排序的复杂性的界限。在大多数情况下,我们的边界很紧,需要对外部记忆中经典下限技术的新概括,以适应键的不均匀性。
In the unit-cost comparison model, a black box takes an input two items and outputs the result of the comparison. Problems like sorting and searching have been studied in this model, and it has been generalized to include the concept of priced information, where different pairs of items (say database records) have different comparison costs. These comparison costs can be arbitrary (in which case no algorithm can be close to optimal (Charikar et al. STOC 2000)), structured (for example, the comparison cost may depend on the length of the databases (Gupta et al. FOCS 2001)), or stochastic (Angelov et al. LATIN 2008). Motivated by the database setting where the cost depends on the sizes of the items, we consider the problems of sorting and batched predecessor where two non-uniform sets of items $A$ and $B$ are given as input. (1) In the RAM setting, we consider the scenario where both sets have $n$ keys each. The cost to compare two items in $A$ is $a$, to compare an item of $A$ to an item of $B$ is $b$, and to compare two items in $B$ is $c$. We give upper and lower bounds for the case $a \le b \le c$. Notice that the case $b=1, a=c=\infty$ is the famous ``nuts and bolts'' problem. (2) In the Disk-Access Model (DAM), where transferring elements between disk and internal memory is the main bottleneck, we consider the scenario where elements in $B$ are larger than elements in $A$. The larger items take more I/Os to be brought into memory, consume more space in internal memory, and are required in their entirety for comparisons. We first give output-sensitive lower and upper bounds on the batched predecessor problem, and use these to derive bounds on the complexity of sorting in the two models. Our bounds are tight in most cases, and require novel generalizations of the classical lower bound techniques in external memory to accommodate the non-uniformity of keys.