可以使用神经声码器有效地创建欺骗语音欺骗对策的培训数据

论文标题

可以使用神经声码器有效地创建欺骗语音欺骗对策的培训数据

Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders

论文作者

Wang, Xin, Yamagishi, Junichi

论文摘要

一项良好的训练套件用于欺骗对策，需要各种TT和VC欺骗攻击，但是在技术上可能要求为目标扬声器生成TTS和VC欺骗试验。本研究不使用成熟的TTS和VC系统，而是使用基于神经网络的歌手来对真正的话语进行复制合成。输出数据可用作欺骗数据。为了更好地利用成对的善意和欺骗数据，本研究引入了对比功能损失，可以将其插入标准训练标准中。根据ASVSPOOF 2019逻辑访问训练集的真正的FIDE试验，这项研究在经验上比较了使用一些神经非自助力的Vocoders以拟议方式创建的一些训练集。在多个测试集中的结果表明，使用目标域中的真正数据，例如，诸如微调神经声码编码器等良好实践。结果还证明了对比特征损失的有效性。训练有素的CM结合了最佳实践，取得了整体竞争性能。它在ASVSPOOF 2021隐藏子集上的EER也优于Top-1挑战提交。

A good training set for speech spoofing countermeasures requires diverse TTS and VC spoofing attacks, but generating TTS and VC spoofed trials for a target speaker may be technically demanding. Instead of using full-fledged TTS and VC systems, this study uses neural-network-based vocoders to do copy-synthesis on bona fide utterances. The output data can be used as spoofed data. To make better use of pairs of bona fide and spoofed data, this study introduces a contrastive feature loss that can be plugged into the standard training criterion. On the basis of the bona fide trials from the ASVspoof 2019 logical access training set, this study empirically compared a few training sets created in the proposed manner using a few neural non-autoregressive vocoders. Results on multiple test sets suggest good practices such as fine-tuning neural vocoders using bona fide data from the target domain. The results also demonstrated the effectiveness of the contrastive feature loss. Combining the best practices, the trained CM achieved overall competitive performance. Its EERs on the ASVspoof 2021 hidden subsets also outperformed the top-1 challenge submission.

下载PDF全文

下载文献需遵守相关版权规定

论文标题