SRTNET：通过随机改进来增强时域语音

论文标题

SRTNET：通过随机改进来增强时域语音

SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement

论文作者

Qiu, Zhibin, Fu, Mengfan, Yu, Yinfeng, Yin, LiLi, Sun, Fuchun, Huang, Hao

论文摘要

扩散模型是一种在图像生成和音频合成中非常流行的新生成模型，很少用于语音增强。在本文中，我们使用扩散模型作为随机改进的模块。我们提出了SRTNET，这是一种通过完整时间域中随机改进来增强语音的新方法。具体而言，我们设计了一个由确定性模块和随机模块组成的关节网络，该模块构成了``增强'''范例。从理论上讲，我们证明了我们方法的可行性，并在实验上证明我们的方法可以实现更快的训练，更快的采样和更高的质量。我们的代码和增强样品可在https://github.com/zhibinqiu/srtnet.git上找到。

Diffusion model, as a new generative model which is very popular in image generation and audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion model as a module for stochastic refinement. We propose SRTNet, a novel method for speech enhancement via Stochastic Refinement in complete Time domain. Specifically, we design a joint network consisting of a deterministic module and a stochastic module, which makes up the ``enhance-and-refine'' paradigm. We theoretically demonstrate the feasibility of our method and experimentally prove that our method achieves faster training, faster sampling and higher quality. Our code and enhanced samples are available at https://github.com/zhibinQiu/SRTNet.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题