论文标题
具有超适配器的多语言机器翻译
Multilingual Machine Translation with Hyper-Adapters
论文作者
论文摘要
多语言机器翻译遭受跨语言的负面干扰。一个常见的解决方案是使用适配器(如适配器)的语言特定模块放宽参数共享。但是,相关语言的适配器无法传输信息,随着语言数量的增长,它们的参数总数变得非常昂贵。在这项工作中,我们使用超级适配器(从语言和图层嵌入产生适配器的超网络)来克服这些缺点。虽然过去的工作在扩展超级网络时的结果差,但我们提出了一个重新固定的修复程序,可显着改善收敛性并使培训更大的超网络。我们发现,超适配器比常规适配器更有效,具有相同的性能,参数少12倍。当使用相同数量的参数和拖船时,我们的方法始终优于常规适配器。同样,超适配器比替代方法比常规密集网络更快地收敛。我们的分析表明,超级适配器学会编码语言相关性,从而使语言跨越正面的转移。
Multilingual machine translation suffers from negative interference across languages. A common solution is to relax parameter sharing with language-specific modules like adapters. However, adapters of related languages are unable to transfer information, and their total number of parameters becomes prohibitively expensive as the number of languages grows. In this work, we overcome these drawbacks using hyper-adapters -- hyper-networks that generate adapters from language and layer embeddings. While past work had poor results when scaling hyper-networks, we propose a rescaling fix that significantly improves convergence and enables training larger hyper-networks. We find that hyper-adapters are more parameter efficient than regular adapters, reaching the same performance with up to 12 times less parameters. When using the same number of parameters and FLOPS, our approach consistently outperforms regular adapters. Also, hyper-adapters converge faster than alternative approaches and scale better than regular dense networks. Our analysis shows that hyper-adapters learn to encode language relatedness, enabling positive transfer across languages.