约束翻译候选者：神经查询翻译与跨语性信息检索之间的桥梁

论文标题

约束翻译候选者：神经查询翻译与跨语性信息检索之间的桥梁

Constraint Translation Candidates: A Bridge between Neural Query Translation and Cross-lingual Information Retrieval

论文作者

Bi, Tianchi, Yao, Liang, Yang, Baosong, Zhang, Haibo, Luo, Weihua, Chen, Boxing

论文摘要

查询翻译（QT）是跨语言信息检索系统（CLIR）的关键组件。在深度学习的帮助下，神经机器翻译（NMT）在各种任务上显示出令人鼓舞的结果。但是，NMT通常接受大规模的室外数据而不是域中查询翻译对训练。此外，翻译模型在推理时间缺乏确保匹配搜索索引的生成单词的机制。 QT的两条短缺导致人类可读文本，但候选人的下游检索任务不足。在本文中，我们提出了一种新颖的方法来通过将QT的开放目标词汇搜索空间限制为从搜索索引数据库中挖出的一组重要单词来减轻这些问题。训练时间和推理时间均采用约束翻译候选者，从而指导翻译模型以学习和生成良好的性能目标查询。在现实字Clir系统（Aliexpress E-Commerce搜索引擎）中利用并检查了所提出的方法。实验结果表明，与强NMT基线相比，我们的方法在翻译质量和检索准确性上的性能都更好。

Query translation (QT) is a key component in cross-lingual information retrieval system (CLIR). With the help of deep learning, neural machine translation (NMT) has shown promising results on various tasks. However, NMT is generally trained with large-scale out-of-domain data rather than in-domain query translation pairs. Besides, the translation model lacks a mechanism at the inference time to guarantee the generated words to match the search index. The two shortages of QT result in readable texts for human but inadequate candidates for the downstream retrieval task. In this paper, we propose a novel approach to alleviate these problems by limiting the open target vocabulary search space of QT to a set of important words mined from search index database. The constraint translation candidates are employed at both of training and inference time, thus guiding the translation model to learn and generate well performing target queries. The proposed methods are exploited and examined in a real-word CLIR system--Aliexpress e-Commerce search engine. Experimental results demonstrate that our approach yields better performance on both translation quality and retrieval accuracy than the strong NMT baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题