论文标题

各向同性表示可以改善致密检索

Isotropic Representation Can Improve Dense Retrieval

论文作者

Jung, Euna, Park, Jungwon, Choi, Jaekeol, Kim, Sungyoon, Rhee, Wonjong

论文摘要

语言表示建模的最新进步广泛影响了密集检索模型的设计。特别是,许多高性能的密集检索模型使用BERT评估查询和文档的表示形式,并随后应用基于余弦相似的评分来确定相关性。然而,已知BERT表示遵循狭窄锥形的各向异性分布,这种各向异性分布对于基于余弦相似的评分是不希望的。在这项工作中,我们首先表明基于伯特的DR还遵循各向异性分布。为了解决这个问题,我们介绍了无监督的后处理方法,使流动和美白的归一化方法,并开发出令牌方法,除了将后处理方法应用于密集的检索模型的表示形式外,除了将后处理方法应用于序列方法。我们表明,提出的方法可以有效地增强各向同性的表示形式,然后我们与Colbert和Repbert进行实验,以表明,文档重新排列的性能(NDCG 10)可以提高5.17 \%$ \%$ \ sim $ 8.09 \ for Colbert和6.88 \%$ \%$ \ sim $ 22.81 for colbert。为了检查各向同性表示对改善DR模型鲁棒性的潜力,我们研究了测试数据集与培训数据集不同的分数外任务。结果表明,各向同性表示可以达到普遍改善的性能。例如,当训练数据集为MS-Marco并且测试数据集为鲁棒04时,各向同性后处理可以提高基线性能高达24.98 \%。此外,我们表明,使用脱离分布数据集训练的各向同性模型甚至可以优于使用分布数据集训练的基线模型。

The recent advancement in language representation modeling has broadly affected the design of dense retrieval models. In particular, many of the high-performing dense retrieval models evaluate representations of query and document using BERT, and subsequently apply a cosine-similarity based scoring to determine the relevance. BERT representations, however, are known to follow an anisotropic distribution of a narrow cone shape and such an anisotropic distribution can be undesirable for the cosine-similarity based scoring. In this work, we first show that BERT-based DR also follows an anisotropic distribution. To cope with the problem, we introduce unsupervised post-processing methods of Normalizing Flow and whitening, and develop token-wise method in addition to the sequence-wise method for applying the post-processing methods to the representations of dense retrieval models. We show that the proposed methods can effectively enhance the representations to be isotropic, then we perform experiments with ColBERT and RepBERT to show that the performance (NDCG at 10) of document re-ranking can be improved by 5.17\%$\sim$8.09\% for ColBERT and 6.88\%$\sim$22.81\% for RepBERT. To examine the potential of isotropic representation for improving the robustness of DR models, we investigate out-of-distribution tasks where the test dataset differs from the training dataset. The results show that isotropic representation can achieve a generally improved performance. For instance, when training dataset is MS-MARCO and test dataset is Robust04, isotropy post-processing can improve the baseline performance by up to 24.98\%. Furthermore, we show that an isotropic model trained with an out-of-distribution dataset can even outperform a baseline model trained with the in-distribution dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源