如何训练您的河马：具有广义正交预测的状态空间模型

论文标题

如何训练您的河马：具有广义正交预测的状态空间模型

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections

论文作者

Gu, Albert, Johnson, Isys, Timalsina, Aman, Rudra, Atri, Ré, Christopher

论文摘要

线性时间不变的状态空间模型（SSM）是工程和统计数据的经典模型，最近通过结构化状态空间序列模型（S4）证明它们在机器学习中非常有前途。 S4的核心成分涉及将SSM状态矩阵初始化为称为河马基质的特定矩阵，这对于S4处理长序列的能力在经验上很重要。但是，S4使用的特定矩阵实际上是在特定时间变化的动态系统中得出的，并且将此矩阵用作时间不变的SSM没有已知的数学解释。因此，S4模拟远程依赖性的理论机制实际上仍无法解释。我们得出了HIPPO框架的更一般和直观的表述，该框架将S4作为对指数型的Legendre多项式的分解提供了简单的数学解释，解释了其捕获长依赖性的能力。我们的概括引入了理论上丰富的SSM类，也使我们能够为其他基础（例如傅立叶基础）得出更直观的S4变体，并解释了训练S4的其他方面，例如如何初始化重要的时间表参数。这些见解将S4的性能提高到远程竞技场基准的86％，在最困难的Path-X任务中，S4的性能为96％。

Linear time-invariant state space models (SSM) are a classical model from engineering and statistics, that have recently been shown to be very promising in machine learning through the Structured State Space sequence model (S4). A core component of S4 involves initializing the SSM state matrix to a particular matrix called a HiPPO matrix, which was empirically important for S4's ability to handle long sequences. However, the specific matrix that S4 uses was actually derived in previous work for a particular time-varying dynamical system, and the use of this matrix as a time-invariant SSM had no known mathematical interpretation. Consequently, the theoretical mechanism by which S4 models long-range dependencies actually remains unexplained. We derive a more general and intuitive formulation of the HiPPO framework, which provides a simple mathematical interpretation of S4 as a decomposition onto exponentially-warped Legendre polynomials, explaining its ability to capture long dependencies. Our generalization introduces a theoretically rich class of SSMs that also lets us derive more intuitive S4 variants for other bases such as the Fourier basis, and explains other aspects of training S4, such as how to initialize the important timescale parameter. These insights improve S4's performance to 86% on the Long Range Arena benchmark, with 96% on the most difficult Path-X task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题