论文标题

使科学变得简单:科学文学的外行摘要的语料库

Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature

论文作者

Goldsack, Tomas, Zhang, Zhihao, Lin, Chenghua, Scarton, Carolina

论文摘要

Lay摘要旨在共同总结和简化给定的文本,从而使其内容对非专家更加理解。自动摘要的自动方法可以为扩大科学文献的访问提供重要价值,从而在研究发现方面具有更大程度的跨学科知识共享和公众的理解。但是,目前的该任务的语料库的规模和范围受到限制,从而阻碍了广泛适用的数据驱动方法的开发。为了纠正这些问题,我们提出了两个新颖的Lay lay摘要数据集,PLOS(大规模)和Elife(中等规模),每个数据集包含生物医学期刊文章以及专家写的Lay summaries。我们提供了详细的摘要表征,强调了可以利用可以利用的数据集之间的不同级别的可读性和抽象性来支持不同应用程序的需求。最后,我们使用主流摘要方法对数据集进行基准测试,并与域专家进行手动评估,证明其实用性并阐明该任务的关键挑战。

Lay summarisation aims to jointly summarise and simplify a given text, thus making its content more comprehensible to non-experts. Automatic approaches for lay summarisation can provide significant value in broadening access to scientific literature, enabling a greater degree of both interdisciplinary knowledge sharing and public understanding when it comes to research findings. However, current corpora for this task are limited in their size and scope, hindering the development of broadly applicable data-driven approaches. Aiming to rectify these issues, we present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale), each of which contains biomedical journal articles alongside expert-written lay summaries. We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets that can be leveraged to support the needs of different applications. Finally, we benchmark our datasets using mainstream summarisation approaches and perform a manual evaluation with domain experts, demonstrating their utility and casting light on the key challenges of this task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源