论文标题

远处监督电子商务查询的老虎机填充

Distant-Supervised Slot-Filling for E-Commerce Queries

论文作者

Manchanda, Saurav, Sharma, Mohit, Karypis, George

论文摘要

插槽填充是指在查询中注释单个术语的任务,该任务具有相应的预期产品特征(产品类型,品牌,性别,大小,颜色等)。然后,搜索引擎可以使用这些特征来返回结果,以更好地匹配查询产品的意图。插槽填充的传统方法需要使用地面真相老虎机通道信息的培训数据。但是,生成此类标记的数据,尤其是在电子商务中是昂贵且耗时的,因为随着新产品的增加,老虎机的数量增加。在本文中,我们提出了不需要手动注释的遥远概率生成模型。提出的方法利用了随时可用的历史查询日志以及这些查询导致的购买,并在插槽中利用共发生信息,以确定预期的产品特征。我们通过考虑如何影响检索性能以及它们对插槽的分类方式来评估我们的方法。在检索方面,我们的方法比OKAPI BM25获得了更好的排名绩效(最高156%)。此外,我们利用共发生信息的方法比在检索和插槽分类任务上的方法更能提高性能。

Slot-filling refers to the task of annotating individual terms in a query with the corresponding intended product characteristics (product type, brand, gender, size, color, etc.). These characteristics can then be used by a search engine to return results that better match the query's product intent. Traditional methods for slot-filling require the availability of training data with ground truth slot-annotation information. However, generating such labeled data, especially in e-commerce is expensive and time-consuming because the number of slots increases as new products are added. In this paper, we present distant-supervised probabilistic generative models, that require no manual annotation. The proposed approaches leverage the readily available historical query logs and the purchases that these queries led to, and also exploit co-occurrence information among the slots in order to identify intended product characteristics. We evaluate our approaches by considering how they affect retrieval performance, as well as how well they classify the slots. In terms of retrieval, our approaches achieve better ranking performance (up to 156%) over Okapi BM25. Moreover, our approach that leverages co-occurrence information leads to better performance than the one that does not on both the retrieval and slot classification tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源