论文标题
分位数回归的几乎线性行采样算法
Nearly Linear Row Sampling Algorithm for Quantile Regression
论文作者
论文摘要
我们给出了针对分位数损耗函数的行采样算法,其样品复杂性几乎在数据的维度上几乎是线性的,从而改善了先前的最佳算法,其采样复杂性至少对维度具有立方体依赖性。基于我们的行采样算法,我们提供了用于分位数回归的最快已知算法和用于平衡的有向图的图形稀疏算法。我们的主要技术贡献是表明,Lewis权重采样(用于$ \ ell_p $ norms的行采样算法中)也可以在行采样算法中应用于各种损失功能。我们通过实验证明方法的实用性来补充理论结果。
We give a row sampling algorithm for the quantile loss function with sample complexity nearly linear in the dimensionality of the data, improving upon the previous best algorithm whose sampling complexity has at least cubic dependence on the dimensionality. Based upon our row sampling algorithm, we give the fastest known algorithm for quantile regression and a graph sparsification algorithm for balanced directed graphs. Our main technical contribution is to show that Lewis weights sampling, which has been used in row sampling algorithms for $\ell_p$ norms, can also be applied in row sampling algorithms for a variety of loss functions. We complement our theoretical results by experiments to demonstrate the practicality of our approach.