论文标题
艰难的负面因素或假否定性:纠正培训神经排名模型中的合并偏见
Hard Negatives or False Negatives: Correcting Pooling Bias in Training Neural Ranking Models
论文作者
论文摘要
神经排名模型(NRMS)已成为信息检索(IR)中最重要的技术之一。由于相关标签的局限性,NRMS的训练在很大程度上依赖于未标记数据的负面采样。在一般的机器学习方案中,它表明具有艰苦负面的培训(即接近阳性的样本)可能会带来更好的性能。令人惊讶的是,我们在IR中的实证研究发现了相反的结果。当将最高级别的结果(不包括标记的积极因素)作为较强猎犬的负面因素时,学习的NRM的性能变得更糟。基于我们的调查,肤浅的原因是,在排名最高的结果中,有更多的假否定性(即未标记的阳性)具有更强的检索器,这可能会损害培训过程。根是在数据集构造过程中汇总偏差的存在,在该过程中,注释只有一些基本的检索器判断和标记很少的样本。因此,原则上,我们可以在培训NRMS中提出虚假的负面问题,因为它可以从具有合并偏见的标签数据集中学习。为了解决这个问题,我们提出了一种新颖的耦合估计技术(CET),该技术同时学习相关模型和选择模型,以纠正训练NRMS的合并偏见。三个检索基准的经验结果表明,接受我们技术训练的NRM可以在针对其他基线策略的排名有效性方面取得显着提高。
Neural ranking models (NRMs) have become one of the most important techniques in information retrieval (IR). Due to the limitation of relevance labels, the training of NRMs heavily relies on negative sampling over unlabeled data. In general machine learning scenarios, it has shown that training with hard negatives (i.e., samples that are close to positives) could lead to better performance. Surprisingly, we find opposite results from our empirical studies in IR. When sampling top-ranked results (excluding the labeled positives) as negatives from a stronger retriever, the performance of the learned NRM becomes even worse. Based on our investigation, the superficial reason is that there are more false negatives (i.e., unlabeled positives) in the top-ranked results with a stronger retriever, which may hurt the training process; The root is the existence of pooling bias in the dataset constructing process, where annotators only judge and label very few samples selected by some basic retrievers. Therefore, in principle, we can formulate the false negative issue in training NRMs as learning from labeled datasets with pooling bias. To solve this problem, we propose a novel Coupled Estimation Technique (CET) that learns both a relevance model and a selection model simultaneously to correct the pooling bias for training NRMs. Empirical results on three retrieval benchmarks show that NRMs trained with our technique can achieve significant gains on ranking effectiveness against other baseline strategies.