硬阈值操作员的梯度特性

论文标题

硬阈值操作员的梯度特性

Gradient Properties of Hard Thresholding Operator

论文作者

Damadi, Saeed, Shen, Jinglai

论文摘要

稀疏优化在许多应用中受到越来越多的关注，例如压缩感应，回归问题中的可变选择以及最近在机器学习中的神经网络压缩。例如，压缩神经网络的问题是双层，随机和非凸问题，可以将其施加到稀疏的优化问题中。因此，开发有效的稀疏优化方法在应用中起着至关重要的作用。本文的目的是使用硬阈值操作员为一般的大尺寸稀疏优化问题开发分析技术。为此，我们研究了迭代的硬阈值（IHT）算法，该算法在文献中已经进行了广泛的研究，因为它是可扩展，快速且易于实现的。尽管对IHT方案进行了广泛的研究，但我们开发了几种新技术，这些技术不仅恢复了许多已知结果，而且还带来了新的结果。具体而言，我们首先建立了硬阈值（HT）运算符的新的至关重要的梯度下降属性。我们的梯度下降结果可能与稀疏点之间的距离有关。同样，我们的梯度下降特性允许当步骤尺寸小于或等于1/L时研究IHT，其中L> 0是目标函数梯度的Lipschitz常数。请注意，文献中的现有技术只有在严格小于1/L的步骤时才能处理情况。通过利用这一点，我们介绍和研究HT稳定和HT固定点的固定点，并表明无论初始化与HT固定固定点有多近（从稀疏的意义上讲是鞍点），IHT序列都将其留下。最后，我们表明，无论选择哪种稀疏初始点，IHT序列都会收敛，如果HT稳定固定点处的函数值不同。

Sparse optimization receives increasing attention in many applications such as compressed sensing, variable selection in regression problems, and recently neural network compression in machine learning. For example, the problem of compressing a neural network is a bi-level, stochastic, and nonconvex problem that can be cast into a sparse optimization problem. Hence, developing efficient methods for sparse optimization plays a critical role in applications. The goal of this paper is to develop analytical techniques for general, large size sparse optimization problems using the hard thresholding operator. To this end, we study the iterative hard thresholding (IHT) algorithm, which has been extensively studied in the literature because it is scalable, fast, and easily implementable. In spite of extensive research on the IHT scheme, we develop several new techniques that not only recover many known results but also lead to new results. Specifically, we first establish a new and critical gradient descent property of the hard thresholding (HT) operator. Our gradient descent result can be related to the distance between points that are sparse. Also, our gradient descent property allows one to study the IHT when the stepsize is less than or equal to 1/L, where L>0 is the Lipschitz constant of the gradient of an objective function. Note that the existing techniques in the literature can only handle the case when the stepsize is strictly less than 1/L. By exploiting this we introduce and study HT-stable and HT-unstable stationary points and show no matter how close an initialization is to a HT-unstable stationary point (saddle point in sparse sense), the IHT sequence leaves it. Finally, we show that no matter what sparse initial point is selected, the IHT sequence converges if the function values at HT-stable stationary points are distinct.

下载PDF全文

下载文献需遵守相关版权规定

论文标题