论文标题
混合检索的融合功能的分析
An Analysis of Fusion Functions for Hybrid Retrieval
论文作者
论文摘要
我们在文本检索中研究了混合搜索,其中词汇和语义搜索与直觉融合在一起,即两者在模拟相关性的方式上是互补的。特别是,我们通过词汇和语义得分的凸组合(CC)以及相互级别融合(RRF)方法来检查融合,并确定其优势和潜在的陷阱。与现有研究相反,我们发现RRF对其参数敏感。 CC融合的学习通常是评分归一化的选择不可知的。 CC在内域和室外设置中的表现优于RRF;最后,该CC是样品效率的,只需要一小部分训练示例才能将其唯一的参数调整为目标域。
We study hybrid search in text retrieval where lexical and semantic search are fused together with the intuition that the two are complementary in how they model relevance. In particular, we examine fusion by a convex combination (CC) of lexical and semantic scores, as well as the Reciprocal Rank Fusion (RRF) method, and identify their advantages and potential pitfalls. Contrary to existing studies, we find RRF to be sensitive to its parameters; that the learning of a CC fusion is generally agnostic to the choice of score normalization; that CC outperforms RRF in in-domain and out-of-domain settings; and finally, that CC is sample efficient, requiring only a small set of training examples to tune its only parameter to a target domain.