论文标题
词汇和语义匹配的密集表示框架
A Dense Representation Framework for Lexical and Semantic Matching
论文作者
论文摘要
词汇和语义匹配捕获了文本检索的不同成功的方法,其结果的融合已被证明比单独的任何一个更有效,更强大。先前的工作通过使用不同的系统(例如Lucene和Faiss)进行词汇和语义匹配来进行混合检索,然后融合其模型输出。相比之下,我们的工作将词汇表示与密集的语义表示,通过将高维词汇表示形式致密,我们称为低维密度词汇表示(DLRS)。我们的实验表明,DLR可以有效地近似原始的词汇表示,从而确保有效性,同时提高查询延迟。此外,与现有的混合技术相比,我们可以将密集的词汇和语义表示形式结合起来,以产生更灵活并更快的检索速度。此外,我们在单个模型中探索了它共同训练词汇和语义表示形式,并从经验上表明,由此产生的DHR能够结合各个组件的优势。我们最好的DHR模型是在内域和零摄像机评估设置中与最先进的单矢量和多向量密集回收者竞争。此外,我们的模型既更快又需要较小的索引,这使我们的密集表示框架成为文本检索的有吸引力的方法。我们的代码可在https://github.com/castorini/dhr上找到。
Lexical and semantic matching capture different successful approaches to text retrieval and the fusion of their results has proven to be more effective and robust than either alone. Prior work performs hybrid retrieval by conducting lexical and semantic matching using different systems (e.g., Lucene and Faiss, respectively) and then fusing their model outputs. In contrast, our work integrates lexical representations with dense semantic representations by densifying high-dimensional lexical representations into what we call low-dimensional dense lexical representations (DLRs). Our experiments show that DLRs can effectively approximate the original lexical representations, preserving effectiveness while improving query latency. Furthermore, we can combine dense lexical and semantic representations to generate dense hybrid representations (DHRs) that are more flexible and yield faster retrieval compared to existing hybrid techniques. In addition, we explore it jointly training lexical and semantic representations in a single model and empirically show that the resulting DHRs are able to combine the advantages of the individual components. Our best DHR model is competitive with state-of-the-art single-vector and multi-vector dense retrievers in both in-domain and zero-shot evaluation settings. Furthermore, our model is both faster and requires smaller indexes, making our dense representation framework an attractive approach to text retrieval. Our code is available at https://github.com/castorini/dhr.