使用语料库的自适应重新排列

论文标题

使用语料库的自适应重新排列

Adaptive Re-Ranking with a Corpus Graph

论文作者

MacAvaney, Sean, Tonellotto, Nicola, Macdonald, Craig

论文摘要

搜索系统通常采用重新排列的管道，其中最初的候选人池中的文档（或段落）分配了新的排名分数。该过程可以使用高效但昂贵的评分功能，这些功能不适合直接在倒置指数或大约最近的邻居指数等结构中使用。但是，重新排列的管道本质上受到初始候选池的召回的限制。无法识别未确定为重新排列的候选人的文档。我们提出了一种基于公认的聚类假设来克服召回限制的新方法。在整个重新排列过程中，我们的方法将文档添加到池中最相似的文档，这些文档与得分最高的文档至今。这种反馈过程将候选人池适应可能也可能产生高级分数的候选者，即使它们不在初始池中。它还可以提高由于预算有限而在池中看起来更深的文档分数。我们发现，基于图形的自适应重新排列（GAR）方法可以显着提高根据精确和召回措施的重新排列管道的性能，与各种现有技术（例如，密集的检索）相互互补，对其超参数具有良好的功能，对其超参数构成了计算和存储成本的贡献。例如，在使用Monot5 Ranker时，在MS MARCO通过排名数据集中，GAR可以将BM25候选池的NDCG提高高达8％。

Search systems often employ a re-ranking pipeline, wherein documents (or passages) from an initial pool of candidates are assigned new ranking scores. The process enables the use of highly-effective but expensive scoring functions that are not suitable for use directly in structures like inverted indices or approximate nearest neighbour indices. However, re-ranking pipelines are inherently limited by the recall of the initial candidate pool; documents that are not identified as candidates for re-ranking by the initial retrieval function cannot be identified. We propose a novel approach for overcoming the recall limitation based on the well-established clustering hypothesis. Throughout the re-ranking process, our approach adds documents to the pool that are most similar to the highest-scoring documents up to that point. This feedback process adapts the pool of candidates to those that may also yield high ranking scores, even if they were not present in the initial pool. It can also increase the score of documents that appear deeper in the pool that would have otherwise been skipped due to a limited re-ranking budget. We find that our Graph-based Adaptive Re-ranking (GAR) approach significantly improves the performance of re-ranking pipelines in terms of precision- and recall-oriented measures, is complementary to a variety of existing techniques (e.g., dense retrieval), is robust to its hyperparameters, and contributes minimally to computational and storage costs. For instance, on the MS MARCO passage ranking dataset, GAR can improve the nDCG of a BM25 candidate pool by up to 8% when applying a monoT5 ranker.

下载PDF全文

下载文献需遵守相关版权规定

论文标题