全球电子商务的跨语性低资源设置为定义检索

论文标题

全球电子商务的跨语性低资源设置为定义检索

Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce

论文作者

Li, Juntao, Liu, Chang, Wang, Jian, Bing, Lidong, Li, Hongsong, Liu, Xiaozhong, Zhao, Dongyan, Yan, Rui

论文摘要

随着跨境电子商务的繁荣，迫切需要设计智能方法，以帮助电子商务卖家为来自世界各地的消费者提供当地产品。在本文中，我们探讨了跨语言信息检索的新任务，即跨境电子商务中的跨语言设置为描述检索，其中涉及在源语言中与目标语言中有说服力的产品描述的产品属性集匹配。我们手动收集了一个新的高质量配对数据集，其中每对包含一个以源语言设置的无序产品属性，以及目标语言中的信息性产品描述。由于数据集构建过程既耗时又昂贵，因此新数据集仅包括13.5k对，这是一个低资源的设置，可以被视为跨境电子商务模型开发和评估的挑战性测试。为了解决这一跨语性设置对描述的检索任务，我们提出了一个新颖的跨语性匹配网络（CLMN），并在预先训练的单语言BERT表示上增强了与上下文相关的跨语性映射的增强。实验结果表明，我们提出的CLMN在具有挑战性的任务上产生了令人印象深刻的结果，而依赖上下文的跨语性映射在BERT上产生了明显的改善，这比预先训练的多语言BERT模型显着改善。

With the prosperous of cross-border e-commerce, there is an urgent demand for designing intelligent approaches for assisting e-commerce sellers to offer local products for consumers from all over the world. In this paper, we explore a new task of cross-lingual information retrieval, i.e., cross-lingual set-to-description retrieval in cross-border e-commerce, which involves matching product attribute sets in the source language with persuasive product descriptions in the target language. We manually collect a new and high-quality paired dataset, where each pair contains an unordered product attribute set in the source language and an informative product description in the target language. As the dataset construction process is both time-consuming and costly, the new dataset only comprises of 13.5k pairs, which is a low-resource setting and can be viewed as a challenging testbed for model development and evaluation in cross-border e-commerce. To tackle this cross-lingual set-to-description retrieval task, we propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping upon the pre-trained monolingual BERT representations. Experimental results indicate that our proposed CLMN yields impressive results on the challenging task and the context-dependent cross-lingual mapping on BERT yields noticeable improvement over the pre-trained multi-lingual BERT model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题