论文标题
从长答案中自动产生问题
Towards Automatic Generation of Questions from Long Answers
论文作者
论文摘要
自动问题生成(AQG)在辅导系统,对话代理,医疗保健素养和信息检索等领域中具有广泛的适用性。 AQG的现有努力仅限于最多两个或三个句子的短答案。但是,几个现实世界的应用程序需要从几个句子的答案中产生问题。因此,我们提出了一种新颖的评估基准,以评估现有AQG系统的长篇文本答案的性能。我们利用大规模开源的Google自然问题数据集创建上述长期AQG基准。我们从经验上证明,随着答案的长度的增加,现有AQG方法的性能会大大降低。在自动和人类评估方面,基于变压器的方法在长答案方面优于其他现有的AQG方法。但是,我们仍然观察到以增加句子长度的最佳性能模型的性能下降,这表明长答案质量检查是未来研究的具有挑战性的基准任务。
Automatic question generation (AQG) has broad applicability in domains such as tutoring systems, conversational agents, healthcare literacy, and information retrieval. Existing efforts at AQG have been limited to short answer lengths of up to two or three sentences. However, several real-world applications require question generation from answers that span several sentences. Therefore, we propose a novel evaluation benchmark to assess the performance of existing AQG systems for long-text answers. We leverage the large-scale open-source Google Natural Questions dataset to create the aforementioned long-answer AQG benchmark. We empirically demonstrate that the performance of existing AQG methods significantly degrades as the length of the answer increases. Transformer-based methods outperform other existing AQG methods on long answers in terms of automatic as well as human evaluation. However, we still observe degradation in the performance of our best performing models with increasing sentence length, suggesting that long answer QA is a challenging benchmark task for future research.