论文标题

CPS-Mebr:单击基于多件检索的反馈意见网页摘要

CPS-MEBR: Click Feedback-Aware Web Page Summarization for Multi-Embedding-Based Retrieval

论文作者

Li, Wenbiao, Tang, Pan, Wu, Zhengfan, Lu, Weixue, Zhang, Minghua, Tian, Zhenlei, Shi, Daiting, Sun, Yu, Gu, Simiu, Yin, Dawei

论文摘要

基于嵌入的检索(EBR)是一种使用嵌入式来表示查询和文档的技术,然后将检索问题转换为嵌入空间中最近的邻居搜索问题。以前的一些作品主要集中于用单个嵌入来表示网页,但是在实际的Web搜索方案中,很难将长而复杂的结构化网页的所有信息作为单个嵌入。为了解决此问题,我们设计了一个单击反馈感知的网页摘要,用于基于多层的检索(CPS-MEBR)框架,该框架能够为网页生成多个嵌入式网页以匹配不同的潜在查询。具体来说,我们使用搜索日志中用户的点击数据来训练摘要模型,以在用户经常单击的网页中提取这些句子,这些句子更有可能回答那些潜在的查询。同时,我们介绍了句子级的语义互动来设计基于多层的检索(MEBR)模型,该模型可以通过在网页中使用经常单击的句子来生成多个嵌入以处理不同的潜在查询。离线实验表明,与基于单物质的检索(SEBR)模型相比,它可以执行高质量的候选检索。

Embedding-based retrieval (EBR) is a technique to use embeddings to represent query and document, and then convert the retrieval problem into a nearest neighbor search problem in the embedding space. Some previous works have mainly focused on representing the web page with a single embedding, but in real web search scenarios, it is difficult to represent all the information of a long and complex structured web page as a single embedding. To address this issue, we design a click feedback-aware web page summarization for multi-embedding-based retrieval (CPS-MEBR) framework which is able to generate multiple embeddings for web pages to match different potential queries. Specifically, we use the click data of users in search logs to train a summary model to extract those sentences in web pages that are frequently clicked by users, which are more likely to answer those potential queries. Meanwhile, we introduce sentence-level semantic interaction to design a multi-embedding-based retrieval (MEBR) model, which can generate multiple embeddings to deal with different potential queries by using frequently clicked sentences in web pages. Offline experiments show that it can perform high quality candidate retrieval compared to single-embedding-based retrieval (SEBR) model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源