论文标题
迭代性自我收获:改善低延迟语音增强模型的新颖技巧
Iterative autoregression: a novel trick to improve your low-latency speech enhancement model
论文作者
论文摘要
流模型是实时语音增强工具的重要组成部分。流式制度将语音增强模型限制为仅使用未来信息的微小背景。结果,低延迟流式设置通常被认为是一项具有挑战性的任务,并对模型的质量产生了重大负面影响。但是,流媒体生成的顺序性质为自动估计提供了一种自然的可能性,即在制作当前的预测时利用先前的预测。训练自回旋模型的传统方法是教师强迫,但其主要缺点在于训练推导不匹配,可能导致质量的重大退化。在这项研究中,我们提出了一种直接而有效的替代技术,用于训练自回归的低延迟语音增强模型。我们证明,所提出的方法会导致各种体系结构和培训场景之间的稳定改进。
Streaming models are an essential component of real-time speech enhancement tools. The streaming regime constrains speech enhancement models to use only a tiny context of future information. As a result, the low-latency streaming setup is generally considered a challenging task and has a significant negative impact on the model's quality. However, the sequential nature of streaming generation offers a natural possibility for autoregression, that is, utilizing previous predictions while making current ones. The conventional method for training autoregressive models is teacher forcing, but its primary drawback lies in the training-inference mismatch that can lead to a substantial degradation in quality. In this study, we propose a straightforward yet effective alternative technique for training autoregressive low-latency speech enhancement models. We demonstrate that the proposed approach leads to stable improvement across diverse architectures and training scenarios.