电子商务的多模式属性提取

论文标题

电子商务的多模式属性提取

Multi-Modal Attribute Extraction for E-Commerce

论文作者

De la Comble, Aloïs, Dutt, Anuvabh, Montalvo, Pablo, Salah, Aghiles

论文摘要

为了改善用户在在线市场提供的无数选择时，必须拥有良好的产品目录，这一点至关重要。其中一种关键要素是诸如颜色或材料之类的产品属性的可用性。但是，在我们关注的某些市场（例如Rakuten-Ichiba）上，属性信息通常不完整甚至缺失。解决此问题的一种有希望的解决方案是依靠在大型语料库中预先训练的深层模型来预测非结构化数据的属性，例如产品描述性文本和图像（本文中称为模式）。但是，我们发现，通过这种方法实现令人满意的表现并不是一件直接的，而是几种改进的结果，我们在本文中进行了讨论。我们提供了详细描述我们属性提取方法的详细描述，从研究强大的单模式方法到建立结合文本和视觉信息的坚实的多模型模型。我们多模式结构的一个关键组成部分是一种新颖的方法，可以无缝结合模式，这是我们的单模式研究的启发。在实践中，我们注意到这种新的模式混合方法可能会遭受模态崩溃问题的困扰，即它忽略了一种方式。因此，我们进一步提出了基于原则上的正规化计划来缓解此问题。 Rakuten-Ichiba数据的实验为我们的方法的好处提供了经验证据，该方法也已成功部署到Rakuten-Ichiba。我们还报告了公开可用数据集的结果，表明与最近的多模式和单峰基线相比，我们的模型具有竞争力。

To improve users' experience as they navigate the myriad of options offered by online marketplaces, it is essential to have well-organized product catalogs. One key ingredient to that is the availability of product attributes such as color or material. However, on some marketplaces such as Rakuten-Ichiba, which we focus on, attribute information is often incomplete or even missing. One promising solution to this problem is to rely on deep models pre-trained on large corpora to predict attributes from unstructured data, such as product descriptive texts and images (referred to as modalities in this paper). However, we find that achieving satisfactory performance with this approach is not straightforward but rather the result of several refinements, which we discuss in this paper. We provide a detailed description of our approach to attribute extraction, from investigating strong single-modality methods, to building a solid multimodal model combining textual and visual information. One key component of our multimodal architecture is a novel approach to seamlessly combine modalities, which is inspired by our single-modality investigations. In practice, we notice that this new modality-merging method may suffer from a modality collapse issue, i.e., it neglects one modality. Hence, we further propose a mitigation to this problem based on a principled regularization scheme. Experiments on Rakuten-Ichiba data provide empirical evidence for the benefits of our approach, which has been also successfully deployed to Rakuten-Ichiba. We also report results on publicly available datasets showing that our model is competitive compared to several recent multimodal and unimodal baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题