论文标题
可构成的核心,用于限制的决定因素最大化及以后
Composable Coresets for Constrained Determinant Maximization and Beyond
论文作者
论文摘要
在大型数据集的背景下,我们研究了在分区约束下确定性最大化的任务。给定一个点集$ v \ subset \ mathbb {r}^d $,将$ s $组$ v_1,...,...,v_s $和整数$ k_1,...,k_s $中的$ k = \ sum_i k_i $中的$ k_i k_i $中的$ k_i $ k_i $ k_i $ k_i $ k_i $ k_i $ k_i $ k_i $ i是$ i $ i $ i $ imized $ i $ k $。确定性最大化及其受约束的变体对建模多样性产生了很大的兴趣,并在公平性和数据摘要的背景下发现了应用。 我们研究了可组合核心的设计,以解决受约束的决定性最大化问题。合并核心是数据的小子集,该数据(大约)保留了优化任务的最佳解决方案,并在包括分布式和流设置在内的其他几个大型数据模型中启用有效的解决方案。在这项工作中,我们考虑了两个制度。对于$ k> d $的情况,我们展示了一种剥皮算法,该算法为我们提供了一个可组合的核心$ kd $,近似值为$ d^{o(d)} $。我们通过表明这种近似因素很紧张来补充结果。对于$ k \ leq d $,我们表明对先前算法的简单修改会导致通过下限验证的最佳核心。我们的结果适用于所有强烈的雷利分布和其他一些实验设计问题。此外,我们在更一般的层流基质约束下显示了核构建算法。
We study the task of determinant maximization under partition constraint, in the context of large data sets. Given a point set $V\subset \mathbb{R}^d$ that is partitioned into $s$ groups $V_1,..., V_s$, and integers $k_1,...,k_s$ where $k=\sum_i k_i$, the goal is to pick $k_i$ points from group $i$ such that the overall determinant of the picked $k$ points is maximized. Determinant Maximization and its constrained variants have gained a lot of interest for modeling diversityand have found applications in the context of fairness and data summarization. We study the design of composable coresets for the constrained determinant maximization problem. Composable coresets are small subsets of the data that (approximately) preserve optimal solutions to optimization tasks and enable efficient solutions in several other large data models including the distributed and the streaming settings. In this work, we consider two regimes. For the case of $k>d$, we show a peeling algorithm that gives us a composable coreset of size $kd$ with an approximation factor of $d^{O(d)}$. We complement our results by showing that this approximation factor is tight. For the case of $k\leq d$, we show that a simple modification of the previous algorithms results in an optimal coreset verified by our lower bounds. Our results apply to all strongly Rayleigh distribution and several other experimental design problems. In addition, we show coreset construction algorithms under the more general laminar matroid constraints.