通过琐碎的自适应和动量方法

论文标题

通过琐碎的自适应和动量方法

Adaptive and Momentum Methods on Manifolds Through Trivializations

论文作者

Lezcano-Casado, Mario

论文摘要

自适应方法没有直接的概括来流形，因为自适应项并非不变。歧管上的动量方法遭受了由于歧管的曲率而引起的效率问题。我们引入了一个框架，以将自适应和动量方法推广到任意流形，以指出对于每个可区分的歧管，都存在一个径向凸出的开放集，该集合几乎涵盖了所有歧管。在径向凸，此集合对$ \ Mathbb {r}^n $是不同的。这给出了任何基于自适应和动量的算法的自然概括，该算法涵盖了任意流形中几乎所有歧管的集合。我们还展示了如何将这些方法扩展到具有回缩的梯度下降方法的上下文。为了实现其实现，我们为仅5个矩阵乘法的矩阵的指数带来了近似，这使其在GPU上特别有效。在实践中，我们看到，这种算法家族缩小了通过对流动性上的动量和自适应方法的不正确使用而产生的数值差距。同时，我们看到该家族最有效的算法是通过简单地通过指数图将问题撤回到初始点的切线空间来给出的。

Adaptive methods do not have a direct generalization to manifolds as the adaptive term is not invariant. Momentum methods on manifolds suffer from efficiency problems stemming from the curvature of the manifold. We introduce a framework to generalize adaptive and momentum methods to arbitrary manifolds by noting that for every differentiable manifold, there exists a radially convex open set that covers almost all the manifold. Being radially convex, this set is diffeomorphic to $\mathbb{R}^n$. This gives a natural generalization of any adaptive and momentum-based algorithm to a set that covers almost all the manifold in an arbitrary manifolds. We also show how to extend these methods to the context of gradient descent methods with a retraction. For its implementation, we bring an approximation to the exponential of matrices that needs just of 5 matrix multiplications, making it particularly efficient on GPUs. In practice, we see that this family of algorithms closes the numerical gap created by an incorrect use of momentum and adaptive methods on manifolds. At the same time, we see that the most efficient algorithm of this family is given by simply pulling back the problem to the tangent space at the initial point via the exponential map.

下载PDF全文

下载文献需遵守相关版权规定

论文标题