问题驱动的消费者健康问题答案的摘要

论文标题

问题驱动的消费者健康问题答案的摘要

Question-Driven Summarization of Answers to Consumer Health Questions

论文作者

Savery, Max, Abacha, Asma Ben, Gayen, Soumya, Demner-Fushman, Dina

论文摘要

自然语言的自动摘要是计算机科学中广泛研究的领域，该领域广泛适用于通常需要了解大量信息的任何人。例如，在医疗领域，深度学习方法的最新发展自动摘要有可能使患者和消费者更容易获得健康信息。但是，为了评估自动生成的健康信息摘要的质量，需要黄金标准的人类产生的摘要。使用国家医学图书馆的消费者健康问题答案系统提供的答案，我们提供了Mediqa答案摘要数据集，这是第一个摘要集合，其中包含问题驱动的消费者健康问题答案的摘要。该数据集可用于评估算法使用提取或抽象方法生成的单一或多文件摘要。为了对数据集进行基准测试，我们包括基线和最先进的深度学习摘要模型的结果，表明该数据集可用于有效评估问题驱动的机器生成的摘要，并在医疗问题答案中促进进一步的机器学习研究。

Automatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who routinely needs to understand large quantities of information. For example, in the medical domain, recent developments in deep learning approaches to automatic summarization have the potential to make health information more easily accessible to patients and consumers. However, to evaluate the quality of automatically generated summaries of health information, gold-standard, human generated summaries are required. Using answers provided by the National Library of Medicine's consumer health question answering system, we present the MEDIQA Answer Summarization dataset, the first summarization collection containing question-driven summaries of answers to consumer health questions. This dataset can be used to evaluate single or multi-document summaries generated by algorithms using extractive or abstractive approaches. In order to benchmark the dataset, we include results of baseline and state-of-the-art deep learning summarization models, demonstrating that this dataset can be used to effectively evaluate question-driven machine-generated summaries and promote further machine learning research in medical question answering.

下载PDF全文

下载文献需遵守相关版权规定

论文标题