对基于变压器的文章编码的大规模评估有关引用建议的任务

论文标题

对基于变压器的文章编码的大规模评估有关引用建议的任务

Large-scale Evaluation of Transformer-based Article Encoders on the Task of Citation Recommendation

论文作者

Medić, Zoran, Šnajder, Jan

论文摘要

最近引入的基于变压器的文章编码器（TAES）旨在为相关的科学文章生成相似的矢量表示，已在基准数据集上表现出强烈的性能，以提供科学文章的建议。但是，现有的基准数据集主要集中在单个域上，在某些情况下，在小型候选池中易于否定。评估此类基准测试的表示形式可能会掩盖TAE在候选池中成千上万篇文章的设置中的现实性能。在这项工作中，我们在具有更具挑战性的候选池的大型基准上评估了TAE。我们将TAE的性能与词汇检索基线模型BM25在引文建议的任务中进行了比较，在该任务中，该模型在给定输入文章中产生了引用的建议列表。我们发现，BM25仍然与最先进的神经猎犬具有非常有竞争力的竞争力，鉴于TAE在小型基准上表现出色，这一发现令人惊讶。作为对现有基准的局限性的补救措施，我们提出了一个新的基准数据集，以评估科学文章表示：多域引文建议数据集（MDCR），该数据集（MDCR）涵盖了不同的科学领域，并包含具有挑战性的候选池。

Recently introduced transformer-based article encoders (TAEs) designed to produce similar vector representations for mutually related scientific articles have demonstrated strong performance on benchmark datasets for scientific article recommendation. However, the existing benchmark datasets are predominantly focused on single domains and, in some cases, contain easy negatives in small candidate pools. Evaluating representations on such benchmarks might obscure the realistic performance of TAEs in setups with thousands of articles in candidate pools. In this work, we evaluate TAEs on large benchmarks with more challenging candidate pools. We compare the performance of TAEs with a lexical retrieval baseline model BM25 on the task of citation recommendation, where the model produces a list of recommendations for citing in a given input article. We find out that BM25 is still very competitive with the state-of-the-art neural retrievers, a finding which is surprising given the strong performance of TAEs on small benchmarks. As a remedy for the limitations of the existing benchmarks, we propose a new benchmark dataset for evaluating scientific article representations: Multi-Domain Citation Recommendation dataset (MDCR), which covers different scientific fields and contains challenging candidate pools.

下载PDF全文

下载文献需遵守相关版权规定

论文标题