论文标题

在对话搜索中的缓存历史嵌入

Caching Historical Embeddings in Conversational Search

论文作者

Frieder, Ophir, Mele, Ida, Muntean, Cristina Ioana, Nardini, Franco Maria, Perego, Raffaele, Tonellotto, Nicola

论文摘要

快速响应,即低潜伏期,在搜索应用中至关重要;在交互式搜索会话中尤其如此,例如在对话设置中遇到的会话。一种有可能减少潜伏期的观察表明,会话查询在检索到的文档列表中表现出时间范围的位置。在此观察结果的推动下,我们提出并评估了一个嵌入缓存的客户端文档,从而提高了对话搜索系统的响应能力。通过利用最新的密集检索模型来抽象文档和查询语义,我们缓存了为对话中引入的主题检索到的文档的嵌入,因为它们可能与连续的查询有关。我们的文档嵌入缓存实现了有效的度量指数,通过估计返回的近似结果集来回答最近的邻居相似性查询。我们通过基于TREC铸造数据集的可重复实验来证明使用缓存实现的效率,达到了高达75%的命中率,而不会降低答案质量。我们达到的高缓存命中率显着提高了对话系统的响应能力,同样减少了搜索后端管理的查询数量。

Rapid response, namely low latency, is fundamental in search applications; it is particularly so in interactive search sessions, such as those encountered in conversational settings. An observation with a potential to reduce latency asserts that conversational queries exhibit a temporal locality in the lists of documents retrieved. Motivated by this observation, we propose and evaluate a client-side document embedding cache, improving the responsiveness of conversational search systems. By leveraging state-of-the-art dense retrieval models to abstract document and query semantics, we cache the embeddings of documents retrieved for a topic introduced in the conversation, as they are likely relevant to successive queries. Our document embedding cache implements an efficient metric index, answering nearest-neighbor similarity queries by estimating the approximate result sets returned. We demonstrate the efficiency achieved using our cache via reproducible experiments based on TREC CAsT datasets, achieving a hit rate of up to 75% without degrading answer quality. Our achieved high cache hit rates significantly improve the responsiveness of conversational systems while likewise reducing the number of queries managed on the search back-end.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源