论文标题
惊喜:通过极值理论的结果清单截断
Surprise: Result List Truncation via Extreme Value Theory
论文作者
论文摘要
信息检索中的工作主要集中在排名和相关性上:鉴于查询,返回与用户相关性订购的一些结果。然而,结果列表截断的问题,或者在哪里截断了排名的结果列表,尽管在各种应用中至关重要,但受到关注较少。这种截断是结果的总体相关性或结果的有用性与处理更多结果的用户成本之间的平衡行为。结果列表截断可能具有挑战性,因为相关得分通常没有得到很好的校准。在大规模的IR系统中尤其如此,其中文档和查询嵌入了相同的度量空间中,并且在推理期间返回查询最近的文档邻居。在这里,相关性与查询文档和候选文档之间的距离成反比,但是距离构成的相关性因查询而异,并且随着添加了更多文档的添加到索引,并且动态更改。在这项工作中,我们提出了惊喜评分,这是一种统计方法,它利用了极值理论中产生的广义帕累托分布,可以在查询时间内使用排名分数在查询时间产生可解释和校准的相关性分数。我们在图像,文本和IR数据集上证明了其在结果列表截断任务上的有效性,并将其与经典和最近的基线进行比较。我们与假设测试和$ p $值建立联系。
Work in information retrieval has largely been centered around ranking and relevance: given a query, return some number of results ordered by relevance to the user. The problem of result list truncation, or where to truncate the ranked list of results, however, has received less attention despite being crucial in a variety of applications. Such truncation is a balancing act between the overall relevance, or usefulness of the results, with the user cost of processing more results. Result list truncation can be challenging because relevance scores are often not well-calibrated. This is particularly true in large-scale IR systems where documents and queries are embedded in the same metric space and a query's nearest document neighbors are returned during inference. Here, relevance is inversely proportional to the distance between the query and candidate document, but what distance constitutes relevance varies from query to query and changes dynamically as more documents are added to the index. In this work, we propose Surprise scoring, a statistical method that leverages the Generalized Pareto distribution that arises in extreme value theory to produce interpretable and calibrated relevance scores at query time using nothing more than the ranked scores. We demonstrate its effectiveness on the result list truncation task across image, text, and IR datasets and compare it to both classical and recent baselines. We draw connections to hypothesis testing and $p$-values.