多语言神经机器翻译，带有深层编码器和多个浅解码器

论文标题

多语言神经机器翻译，带有深层编码器和多个浅解码器

Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders

论文作者

Kong, Xiang, Renduchintala, Adithya, Cross, James, Tang, Yuqing, Gu, Jiatao, Li, Xian

论文摘要

多语言翻译的最新工作将使用具有增加容量的深层变压器模型超过双语基线的翻译质量。但是，这种方法引入的额外延迟和内存成本可能使其对于效率受限的应用程序无法接受。最近已经显示出双语翻译，即使用深层编码器和浅解码器（DESD）可以减少推理潜伏期，同时保持翻译质量，因此我们研究了类似的速度准确性权衡，用于多语言翻译。我们发现，对于多对一的翻译，我们确实可以提高解码器速度而不使用这种方法牺牲质量，但是对于一对一的翻译，浅解码器会导致清晰的质量下降。为了改善这一滴滴，我们提出了一个具有多个浅解码器（DEMSD）的深层编码器，每个浅解码器负责目标语言的脱节子集。具体而言，与没有转换质量下降的标准变压器模型相比，具有2层解码器的DEMSD模型平均能够获得1.8倍的速度。

Recent work in multilingual translation advances translation quality surpassing bilingual baselines using deep transformer models with increased capacity. However, the extra latency and memory costs introduced by this approach may make it unacceptable for efficiency-constrained applications. It has recently been shown for bilingual translation that using a deep encoder and shallow decoder (DESD) can reduce inference latency while maintaining translation quality, so we study similar speed-accuracy trade-offs for multilingual translation. We find that for many-to-one translation we can indeed increase decoder speed without sacrificing quality using this approach, but for one-to-many translation, shallow decoders cause a clear quality drop. To ameliorate this drop, we propose a deep encoder with multiple shallow decoders (DEMSD) where each shallow decoder is responsible for a disjoint subset of target languages. Specifically, the DEMSD model with 2-layer decoders is able to obtain a 1.8x speedup on average compared to a standard transformer model with no drop in translation quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题