论文标题
然后对齐然后总结:摘要COSCUS创建的自动对齐方法
Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation
论文作者
论文摘要
总结文本不是一项简单的任务。甚至在考虑文本摘要之前,应该确定预期的摘要。这些信息应该被压缩多少?它与重新启动有关还是摘要应该坚持原始措辞?自动文本摘要的最新作品主要围绕新闻文章展开。我们建议考虑到多种多样的任务将在概括和鲁棒性方面导致该领域的改善。我们探索会议摘要:从自动转录中生成报告。我们的工作包括对报告进行分割和调整转录,以获取合适的数据集以进行神经摘要。使用自举方法,我们提供了通过人类注释者校正的预一调元,对我们评估自动模型进行了验证集。这一致地通过使用我们的自动对齐模型的注释来提供更好的预先对准并最大程度地减少注释者的努力。评估是对\ PublicMeetings进行的,这是一个新颖的公开会议的新型语料库。我们报告该语料库上的自动对齐和摘要性能,并表明自动对齐与数据注释相关,因为它在汇总任务上所有Rouge分数的近+4大幅提高。
Summarizing texts is not a straightforward task. Before even considering text summarization, one should determine what kind of summary is expected. How much should the information be compressed? Is it relevant to reformulate or should the summary stick to the original phrasing? State-of-the-art on automatic text summarization mostly revolves around news articles. We suggest that considering a wider variety of tasks would lead to an improvement in the field, in terms of generalization and robustness. We explore meeting summarization: generating reports from automatic transcriptions. Our work consists in segmenting and aligning transcriptions with respect to reports, to get a suitable dataset for neural summarization. Using a bootstrapping approach, we provide pre-alignments that are corrected by human annotators, making a validation set against which we evaluate automatic models. This consistently reduces annotators' efforts by providing iteratively better pre-alignment and maximizes the corpus size by using annotations from our automatic alignment models. Evaluation is conducted on \publicmeetings, a novel corpus of aligned public meetings. We report automatic alignment and summarization performances on this corpus and show that automatic alignment is relevant for data annotation since it leads to large improvement of almost +4 on all ROUGE scores on the summarization task.