论文标题
熵避开风险的广泛动量方法
Entropic Risk-Averse Generalized Momentum Methods
论文作者
论文摘要
在遇到随机梯度噪声的一阶算法的背景下,我们研究了收敛速率(量化遗忘的速度)与次优率的“风险”之间的权衡,即偏离预期的次级次要性。我们专注于一般的动量方法(GMM),该方法恢复了流行方法,例如梯度下降(GD),加速梯度下降(AGD)和重球(HB)方法,作为特殊情况,取决于GMM参数的选择。我们使用众所周知的风险措施“熵风险”和“风险处的熵价值”来量化次级临时性的风险。对于强烈凸平的最小化,我们首先通过统一理论获得了GMM的新收敛率结果,该理论也适用于AGD和HB,从而改善了HB的一些现有结果。然后,我们在给定的迭代中以次级优势的熵风险和熵价值提供明确的界限,这也为基于Chernoff的不平等而超过给定阈值的概率提供了直接界限。我们的结果揭示了融合率和次优的风险之间的基本权衡。然后,我们可以插入在计算可牵引的优化框架中获得的熵风险和收敛率估计值,并提出了熵避免风险的GMM(RA-GMM)和熵风险避免风险的AGD(RA-AGD)方法,这些方法可以选择GMM参数以系统地折衷具有风险的索引价值。我们表明,与参数的标准选择相比,RA-AGD和RA-GMM可改善二次优化和逻辑回归问题的性能。据我们所知,我们的工作是第一个以系统的方式诉诸于设计动量方法参数的连贯措施。
In the context of first-order algorithms subject to random gradient noise, we study the trade-offs between the convergence rate (which quantifies how fast the initial conditions are forgotten) and the "risk" of suboptimality, i.e. deviations from the expected suboptimality. We focus on a general class of momentum methods (GMM) which recover popular methods such as gradient descent (GD), accelerated gradient descent (AGD), and heavy-ball (HB) method as special cases depending on the choice of GMM parameters. We use well-known risk measures "entropic risk" and "entropic value at risk" to quantify the risk of suboptimality. For strongly convex smooth minimization, we first obtain new convergence rate results for GMM with a unified theory that is also applicable to both AGD and HB, improving some of the existing results for HB. We then provide explicit bounds on the entropic risk and entropic value at risk of suboptimality at a given iterate which also provides direct bounds on the probability that the suboptimality exceeds a given threshold based on Chernoff's inequality. Our results unveil fundamental trade-offs between the convergence rate and the risk of suboptimality. We then plug the entropic risk and convergence rate estimates we obtained in a computationally tractable optimization framework and propose entropic risk-averse GMM (RA-GMM) and entropic risk-averse AGD (RA-AGD) methods which can select the GMM parameters to systematically trade-off the entropic value at risk with the convergence rate. We show that RA-AGD and RA-GMM lead to improved performance on quadratic optimization and logistic regression problems compared to the standard choice of parameters. To our knowledge, our work is the first to resort to coherent measures to design the parameters of momentum methods in a systematic manner.