自我驱动器：使用渠道建模的分析方法的自我监督语音恢复

论文标题

自我驱动器：使用渠道建模的分析方法的自我监督语音恢复

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

论文作者

Saeki, Takaaki, Takamichi, Shinnosuke, Nakamura, Tomohiko, Tanji, Naoko, Saruwatari, Hiroshi

论文摘要

我们提出了一种自制的语音恢复方法，没有配对的语音语料库。由于先前的一般语音恢复方法使用通过将各种扭曲应用于高质量语音语料库创建的人造配对数据，因此它不能充分代表真实数据的声学扭曲，从而限制了适用性。我们的模型由分析，合成和信道模块组成，这些模拟模拟了降级语音的记录过程，并以自我监督的方式接受了实际退化的语音数据训练。分析模块从降解的语音中提取无失真的语音特征和失真特征，而合成模块合成了恢复的语音波形，通道模块会在语音波形中添加失真。我们的模型还可以实现音频效应传输，其中仅从降解的语音中提取声学扭曲，并添加到任意高质量的音频中。使用模拟和真实数据进行的实验评估表明，我们的方法比以前的监督方法获得了明显更高的语音恢复，这表明其适用于实际退化的语音材料。

We present a self-supervised speech restoration method without paired speech corpora. Because the previous general speech restoration method uses artificial paired data created by applying various distortions to high-quality speech corpora, it cannot sufficiently represent acoustic distortions of real data, limiting the applicability. Our model consists of analysis, synthesis, and channel modules that simulate the recording process of degraded speech and is trained with real degraded speech data in a self-supervised manner. The analysis module extracts distortionless speech features and distortion features from degraded speech, while the synthesis module synthesizes the restored speech waveform, and the channel module adds distortions to the speech waveform. Our model also enables audio effect transfer, in which only acoustic distortions are extracted from degraded speech and added to arbitrary high-quality audio. Experimental evaluations with both simulated and real data show that our method achieves significantly higher-quality speech restoration than the previous supervised method, suggesting its applicability to real degraded speech materials.

下载PDF全文

下载文献需遵守相关版权规定

论文标题