论文标题

启用代码切换机器翻译的检查和策略

Checks and Strategies for Enabling Code-Switched Machine Translation

论文作者

Gowda, Thamme, Gheini, Mozhdeh, May, Jonathan

论文摘要

代码切换是多语言扬声器的常见现象,在单个对话的背景下,两种或多种语言之间的交替发生。虽然多语言人可以在语言之间无缝地来回切换,但多语言神经机器翻译(NMT)模型对于这种突然的输入变化并不强大。这项工作探讨了多语言NMT模型处理代码开关文本的能力。首先,我们提出检查以测量开关功能。其次,我们研究了可以增强NMT模型支持代码转换能力的简单有效的数据增强方法。最后,通过使用注意模块的玻璃盒分析,我们证明了这些方法在改善鲁棒性方面的有效性。

Code-switching is a common phenomenon among multilingual speakers, where alternation between two or more languages occurs within the context of a single conversation. While multilingual humans can seamlessly switch back and forth between languages, multilingual neural machine translation (NMT) models are not robust to such sudden changes in input. This work explores multilingual NMT models' ability to handle code-switched text. First, we propose checks to measure switching capability. Second, we investigate simple and effective data augmentation methods that can enhance an NMT model's ability to support code-switching. Finally, by using a glass-box analysis of attention modules, we demonstrate the effectiveness of these methods in improving robustness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源