几乎没有元文本排名，用于元改编的合成弱监督

论文标题

几乎没有元文本排名，用于元改编的合成弱监督

Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

论文作者

Sun, Si, Qian, Yingzhuo, Liu, Zhenghao, Xiong, Chenyan, Zhang, Kaitao, Bao, Jie, Liu, Zhiyuan, Bennett, Paul

论文摘要

神经信息检索（NEU-IR）的有效性通常取决于大规模的内域相关性培训信号，这些信号在现实世界排名的情况下并不总是可用。为了使Neu-ir的好处民主化，本文介绍了MetaAdaptrank，这是一种自适应学习方法，将NEU-IR模型从标签丰富的源域概括为少数弹出的目标域。元帕克群落借鉴了源域的大规模相关性监督，对目标域和元学习的大量弱监督信号进行了对比，以根据其对NEU-IR模型的目标域排名准确性的益处来重新授予这些合成的“弱”数据。网络，新闻和生物医学领域中三个TREC基准测试的实验表明，荟萃分析可显着提高NEU-IR模型的少量排名准确性。进一步的分析表明，荟萃统计的对比弱数据合成和元素蛋白质数据选择都繁荣起来。本文的代码和数据可以从https://github.com/thunlp/metaadaptrank获得。

The effectiveness of Neural Information Retrieval (Neu-IR) often depends on a large scale of in-domain relevance training signals, which are not always available in real-world ranking scenarios. To democratize the benefits of Neu-IR, this paper presents MetaAdaptRank, a domain adaptive learning method that generalizes Neu-IR models from label-rich source domains to few-shot target domains. Drawing on source-domain massive relevance supervision, MetaAdaptRank contrastively synthesizes a large number of weak supervision signals for target domains and meta-learns to reweight these synthetic "weak" data based on their benefits to the target-domain ranking accuracy of Neu-IR models. Experiments on three TREC benchmarks in the web, news, and biomedical domains show that MetaAdaptRank significantly improves the few-shot ranking accuracy of Neu-IR models. Further analyses indicate that MetaAdaptRank thrives from both its contrastive weak data synthesis and meta-reweighted data selection. The code and data of this paper can be obtained from https://github.com/thunlp/MetaAdaptRank.

下载PDF全文

下载文献需遵守相关版权规定

论文标题