旨在解决基质分解的梯度下降的隐式偏见：贪婪的低级学习

论文标题

旨在解决基质分解的梯度下降的隐式偏见：贪婪的低级学习

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning

论文作者

Li, Zhiyuan, Luo, Yuping, Lyu, Kaifeng

论文摘要

基质分解是一个简单而自然的测试床，用于研究梯度下降的隐式正则化。 Gunasekar等。（2017年）猜想，具有无限初始化的梯度流趋于最小化核定标准的解决方案，但是一系列最近的论文认为，规范最小化的语言不足以为隐式正则化提供充分的表征。在这项工作中，我们提供了理论和经验的证据，即对于Depth-2基质分解，在数学上，具有无限初始化的梯度流量等于简单的启发式等级最小化算法，贪婪的低级学习，在某些合理的假设下。这概括了从以前的作品到更广泛的环境的等级最小化视图，并使我们能够构建反例以反驳Gunasekar等人的猜想。（2017）。我们还将结果扩展到了深度$ \ ge 3 $的情况下，我们表明，更深入的好处是，上述融合对初始化幅度的依赖性较弱，因此该等级最小化更有可能与实际规模初始化生效。

Matrix factorization is a simple and natural test-bed to investigate the implicit regularization of gradient descent. Gunasekar et al. (2017) conjectured that Gradient Flow with infinitesimal initialization converges to the solution that minimizes the nuclear norm, but a series of recent papers argued that the language of norm minimization is not sufficient to give a full characterization for the implicit regularization. In this work, we provide theoretical and empirical evidence that for depth-2 matrix factorization, gradient flow with infinitesimal initialization is mathematically equivalent to a simple heuristic rank minimization algorithm, Greedy Low-Rank Learning, under some reasonable assumptions. This generalizes the rank minimization view from previous works to a much broader setting and enables us to construct counter-examples to refute the conjecture from Gunasekar et al. (2017). We also extend the results to the case where depth $\ge 3$, and we show that the benefit of being deeper is that the above convergence has a much weaker dependence over initialization magnitude so that this rank minimization is more likely to take effect for initialization with practical scale.

下载PDF全文

下载文献需遵守相关版权规定

论文标题