论文标题
快速微分矩阵平方根
Fast Differentiable Matrix Square Root
论文作者
论文摘要
在各种计算机视觉任务中,以可区分的方式计算矩阵平方根或其倒数很重要。先前的方法要么采用奇异值分解(SVD)来显式分解矩阵,要么使用牛顿 - 舒尔茨迭代(NS迭代)来得出近似解决方案。但是,这两种方法在远程通过或向后通过中都不足够有效。在本文中,我们提出了两个更有效的变体来计算可区分的矩阵平方根。对于正向传播,一种方法是使用矩阵Taylor多项式(MTP),另一种方法是使用矩阵Padé近似值(MPA)。向后梯度是通过使用矩阵符号函数迭代求解连续时间Lyapunov方程来计算的。与SVD或Newton-Schulz迭代相比,这两种方法都产生了相当大的加速。 DE批准和二阶视觉变压器的实验结果表明,我们的方法还可以实现竞争性甚至更好的性能。该代码可在\ href {https://github.com/kingjamessong/fastdifferentiablemablesqrt} {https://github.com/kingjamessong/fastdifferentiableciablectiabrematsqrt}中获得。
Computing the matrix square root or its inverse in a differentiable manner is important in a variety of computer vision tasks. Previous methods either adopt the Singular Value Decomposition (SVD) to explicitly factorize the matrix or use the Newton-Schulz iteration (NS iteration) to derive the approximate solution. However, both methods are not computationally efficient enough in either the forward pass or in the backward pass. In this paper, we propose two more efficient variants to compute the differentiable matrix square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Padé Approximants (MPA). The backward gradient is computed by iteratively solving the continuous-time Lyapunov equation using the matrix sign function. Both methods yield considerable speed-up compared with the SVD or the Newton-Schulz iteration. Experimental results on the de-correlated batch normalization and second-order vision transformer demonstrate that our methods can also achieve competitive and even slightly better performances. The code is available at \href{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}.