随机准牛顿立方体正规化优化的新型快速精确子问题求解器

论文标题

随机准牛顿立方体正规化优化的新型快速精确子问题求解器

A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic Regularized Optimization

论文作者

Forristal, Jarad, Griffin, Joshua, Zhou, Wenwen, Yektamaram, Seyedalireza

论文摘要

在这项工作中，我们使用有限的内存准Newton（LQN）矩阵来描述使用立方体（ARC）方法的自适应正规化（ARC）方法。 ARC方法是一种相对较新的优化策略系列，它利用立方体登记（CR）术语代替信任区域和线路搜索。 LQN方法通过与流行一阶方法（如随机梯度下降（SGD））使用的输入相同，提供了使用显式二阶信息的大规模替代方法。求解CR子问题完全需要牛顿的方法，但是使用LQN矩阵的内部结构的属性，我们能够以无基质方式找到CR子问题的精确解决方案，从而提供了较大的加速并扩展到现代尺寸要求中。此外，我们扩展了以前的ARC工作，并明确将一阶更新纳入我们的算法。当使用SR1更新时，我们提供实验结果，与Adam和其他深层神经网络（DNNS）上的其他二阶优化器相比，它显示出很大的加速和竞争性能。我们发现，我们的新方法ARCLQN与使用最小调整的现代优化器相比，这是二阶方法的常见疼痛点。

In this work we describe an Adaptive Regularization using Cubics (ARC) method for large-scale nonconvex unconstrained optimization using Limited-memory Quasi-Newton (LQN) matrices. ARC methods are a relatively new family of optimization strategies that utilize a cubic-regularization (CR) term in place of trust-regions and line-searches. LQN methods offer a large-scale alternative to using explicit second-order information by taking identical inputs to those used by popular first-order methods such as stochastic gradient descent (SGD). Solving the CR subproblem exactly requires Newton's method, yet using properties of the internal structure of LQN matrices, we are able to find exact solutions to the CR subproblem in a matrix-free manner, providing large speedups and scaling into modern size requirements. Additionally, we expand upon previous ARC work and explicitly incorporate first-order updates into our algorithm. We provide experimental results when the SR1 update is used, which show substantial speed-ups and competitive performance compared to Adam and other second order optimizers on deep neural networks (DNNs). We find that our new approach, ARCLQN, compares to modern optimizers with minimal tuning, a common pain-point for second order methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题