中国多人歧义歧义的反向翻译风格的数据增强

论文标题

中国多人歧义歧义的反向翻译风格的数据增强

Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation

论文作者

Qiang, Chunyu, Yang, Peng, Che, Hao, Xiao, Jinba, Wang, Xiaorui, Wang, Zhongyuan

论文摘要

中国素式转换（G2P）的转换在普通话中文文本到语音（TTS）系统中起着重要作用，其中最大的挑战之一是多人歧义的任务。以前的大多数多人歧义模型都在手动注释的数据集上进行了培训，并且稀缺的用于多人电话歧义的数据集很少。在本文中，我们建议使用大量未标记的文本数据，提出了一种简单的反向翻译式数据增强方法，用于普通话中国多人歧义。受到机器翻译领域中提出的背面翻译技术的启发，我们构建了一个字符到音量（G2P）模型，以预测多音特征的发音，以及音素到绘画模型（P2G）模型，以预测发音到文本中。同时，提出了一种基于窗口的匹配策略和多模型评分策略来判断伪标签的正确性。我们设计了一个数据平衡策略，以提高训练集中某些典型的复音字符的准确性，但分布不平衡或数据稀缺。实验结果显示了拟议的反向翻译式数据增强方法的有效性。

Conversion of Chinese Grapheme-to-Phoneme (G2P) plays an important role in Mandarin Chinese Text-To-Speech (TTS) systems, where one of the biggest challenges is the task of polyphone disambiguation. Most of the previous polyphone disambiguation models are trained on manually annotated datasets, and publicly available datasets for polyphone disambiguation are scarce. In this paper we propose a simple back-translation-style data augmentation method for mandarin Chinese polyphone disambiguation, utilizing a large amount of unlabeled text data. Inspired by the back-translation technique proposed in the field of machine translation, we build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text. Meanwhile, a window-based matching strategy and a multi-model scoring strategy are proposed to judge the correctness of the pseudo-label. We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity. The experimental result shows the effectiveness of the proposed back-translation-style data augmentation method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题