DS4DH在TREC健康错误信息2021：具有转移学习和等级融合的多维排名模型

论文标题

DS4DH在TREC健康错误信息2021：具有转移学习和等级融合的多维排名模型

DS4DH at TREC Health Misinformation 2021: Multi-Dimensional Ranking Models with Transfer Learning and Rank Fusion

论文作者

Zhang, Boya, Naderi, Nona, Jaume-Santero, Fernando, Teodoro, Douglas

论文摘要

本文介绍了TREC健康错误信息轨道2021的数据科学（DS4DH）小组的工作。TREC健康错误信息轨迹的重点是开发检索方法，这些方法为网络上的健康相关搜索提供了相关，正确和可信的信息。在我们的方法论中，我们使用了一种两步排名的方法，其中包括i）基于BM25模型的标准检索阶段，ii）一个重新排列阶段，其中一组模型的渠道着重于检索文档的有用性，支持性和可信度维度。为了估计有用性，我们使用基于MS MARCO语料库的Transformers体系结构进行了预训练的语言模型对初始排名列表进行了分类。为了评估支持性，我们利用基于BERT的模型对科学和Wikipedia Corpora进行了微调。最后，为了评估文档的可信度，我们采用了在Microsoft信誉数据集中训练的随机森林模型，并结合了可靠网站的列表。然后，使用相互等级融合算法合并了所得的排名列表，以获取有用，支持和可信文档的最终列表。我们的方法取得了竞争成果，在自动运行的兼容性测量中成为TOP-2。我们的发现表明，将为每个信息质量维度与转移学习创建的自动排名模型集成可以提高与健康相关信息检索的有效性。

This paper describes the work of the Data Science for Digital Health (DS4DH) group at the TREC Health Misinformation Track 2021. The TREC Health Misinformation track focused on the development of retrieval methods that provide relevant, correct and credible information for health related searches on the Web. In our methodology, we used a two-step ranking approach that includes i) a standard retrieval phase, based on BM25 model, and ii) a re-ranking phase, with a pipeline of models focused on the usefulness, supportiveness and credibility dimensions of the retrieved documents. To estimate the usefulness, we classified the initial rank list using pre-trained language models based on the transformers architecture fine-tuned on the MS MARCO corpus. To assess the supportiveness, we utilized BERT-based models fine-tuned on scientific and Wikipedia corpora. Finally, to evaluate the credibility of the documents, we employed a random forest model trained on the Microsoft Credibility dataset combined with a list of credible sites. The resulting ranked lists were then combined using the Reciprocal Rank Fusion algorithm to obtain the final list of useful, supporting and credible documents. Our approach achieved competitive results, being top-2 in the compatibility measurement for the automatic runs. Our findings suggest that integrating automatic ranking models created for each information quality dimension with transfer learning can increase the effectiveness of health-related information retrieval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题