论文标题
旨在解决基质分解的梯度下降的隐式偏见:贪婪的低级学习
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning
论文作者
论文摘要
基质分解是一个简单而自然的测试床,用于研究梯度下降的隐式正则化。 Gunasekar等。 (2017年)猜想,具有无限初始化的梯度流趋于最小化核定标准的解决方案,但是一系列最近的论文认为,规范最小化的语言不足以为隐式正则化提供充分的表征。在这项工作中,我们提供了理论和经验的证据,即对于Depth-2基质分解,在数学上,具有无限初始化的梯度流量等于简单的启发式等级最小化算法,贪婪的低级学习,在某些合理的假设下。这概括了从以前的作品到更广泛的环境的等级最小化视图,并使我们能够构建反例以反驳Gunasekar等人的猜想。 (2017)。我们还将结果扩展到了深度$ \ ge 3 $的情况下,我们表明,更深入的好处是,上述融合对初始化幅度的依赖性较弱,因此该等级最小化更有可能与实际规模初始化生效。
Matrix factorization is a simple and natural test-bed to investigate the implicit regularization of gradient descent. Gunasekar et al. (2017) conjectured that Gradient Flow with infinitesimal initialization converges to the solution that minimizes the nuclear norm, but a series of recent papers argued that the language of norm minimization is not sufficient to give a full characterization for the implicit regularization. In this work, we provide theoretical and empirical evidence that for depth-2 matrix factorization, gradient flow with infinitesimal initialization is mathematically equivalent to a simple heuristic rank minimization algorithm, Greedy Low-Rank Learning, under some reasonable assumptions. This generalizes the rank minimization view from previous works to a much broader setting and enables us to construct counter-examples to refute the conjecture from Gunasekar et al. (2017). We also extend the results to the case where depth $\ge 3$, and we show that the benefit of being deeper is that the above convergence has a much weaker dependence over initialization magnitude so that this rank minimization is more likely to take effect for initialization with practical scale.