使用神经网络进行系统文献评论的引文筛选自动化：一项可复制性研究

论文标题

使用神经网络进行系统文献评论的引文筛选自动化：一项可复制性研究

Automation of Citation Screening for Systematic Literature Reviews using Neural Networks: A Replicability Study

论文作者

Kusa, Wojciech, Hanbury, Allan, Knoth, Petr

论文摘要

在系统文献综述的过程中，估计引文筛选是最耗时的步骤之一。已经提出了多种使用各种机器学习技术自动化它的方法。在过去的两年中，第一批将深神经网络应用于此问题的研究论文发表了。在这项工作中，我们对前两篇深度学习论文进行了可复制性研究，以进行引文筛选，并评估其在23个公开数据集中的性能。尽管我们成功地复制了其中一篇论文的结果，但我们无法复制另一个论文的结果。我们总结了复制所涉及的挑战，包括难以获取数据集以匹配原始论文的实验设置以及执行原始源代码的问题。在这种经验的推动下，我们随后基于平均单词嵌入的模型提出了一个更简单的模型，该模型的表现优于23个数据集中18个模型之一，并且平均比第二重复的方法快72倍。最后，我们在暴露于各种输入特征和随机初始化时测量模型的训练时间和不变性，表明这些方法的鲁棒性差异。

In the process of Systematic Literature Review, citation screening is estimated to be one of the most time-consuming steps. Multiple approaches to automate it using various machine learning techniques have been proposed. The first research papers that apply deep neural networks to this problem were published in the last two years. In this work, we conduct a replicability study of the first two deep learning papers for citation screening and evaluate their performance on 23 publicly available datasets. While we succeeded in replicating the results of one of the papers, we were unable to replicate the results of the other. We summarise the challenges involved in the replication, including difficulties in obtaining the datasets to match the experimental setup of the original papers and problems with executing the original source code. Motivated by this experience, we subsequently present a simpler model based on averaging word embeddings that outperforms one of the models on 18 out of 23 datasets and is, on average, 72 times faster than the second replicated approach. Finally, we measure the training time and the invariance of the models when exposed to a variety of input features and random initialisations, demonstrating differences in the robustness of these approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题