稀疏随机嵌入的坚固且可证明的保证

论文标题

稀疏随机嵌入的坚固且可证明的保证

Robust and Provable Guarantees for Sparse Random Embeddings

论文作者

Skorski, Maciej, Temperoni, Alessandro, Theobald, Martin

论文摘要

在这项工作中，我们改善了稀疏随机嵌入的保证，因为弗雷克森最近在AL提供和分析了它们。（NIPS'18）和Jagadeesan（Nips'19）。具体而言，我们表明（a）与之前提供的渐近保证相反，我们的界限是明确的，并且（b）我们的界限可以通过在广泛参数上实际上具有重要意义的常数（包括尺寸，稀疏性和数据分散）而越来越大。此外，我们从经验上证明，我们的界限在广泛的现实数据集上显着优于先前的作品，例如图像集合，用词袋表示的文本文档以及通过神经嵌入量而观的文本序列。我们的数值改进的背后是更广泛兴趣的技术，它在以前的分析的关键步骤中改进了（c）对某些类型的二次混乱的（c）更严格的估计，（d）建立稀疏线性形式的极端特性，以及（e）对独立随机变量的估计的界限的改进。

In this work, we improve upon the guarantees for sparse random embeddings, as they were recently provided and analyzed by Freksen at al. (NIPS'18) and Jagadeesan (NIPS'19). Specifically, we show that (a) our bounds are explicit as opposed to the asymptotic guarantees provided previously, and (b) our bounds are guaranteed to be sharper by practically significant constants across a wide range of parameters, including the dimensionality, sparsity and dispersion of the data. Moreover, we empirically demonstrate that our bounds significantly outperform prior works on a wide range of real-world datasets, such as collections of images, text documents represented as bags-of-words, and text sequences vectorized by neural embeddings. Behind our numerical improvements are techniques of broader interest, which improve upon key steps of previous analyses in terms of (c) tighter estimates for certain types of quadratic chaos, (d) establishing extreme properties of sparse linear forms, and (e) improvements on bounds for the estimation of sums of independent random variables.

下载PDF全文

下载文献需遵守相关版权规定

论文标题