多文件摘要中的语料库评估和系统偏见检测

论文标题

多文件摘要中的语料库评估和系统偏见检测

Corpora Evaluation and System Bias Detection in Multi-document Summarization

论文作者

Dey, Alvin, Chowdhury, Tanya, Atri, Yash Kumar, Chakraborty, Tanmoy

论文摘要

多文件摘要（MDS）是将任何集合文档集中在简洁文本段落中的关键点的任务。过去，它已用于从各种来源汇总新闻，推文，产品评论等。由于没有对任务的标准定义，我们遇到了很多数据集，其重叠水平和参与文档之间的冲突程度不同。关于MDS中什么构成摘要信息也没有标准。除了挑战之外，新系统报告会导致一组选定的数据集结果，这可能与它们在其他数据集中的性能无关。在本文中，我们借助一些广泛使用的MDS Corpora和一套最先进的模型来研究这项异质任务。我们试图量化摘要语料库的质量，并在提出新的MDS语料库时规定要考虑的积分列表。接下来，我们分析缺乏MDS系统的原因，该系统在整个语料库中都取得了卓越的性能。然后，我们观察到系统指标受到影响的程度，并且由于语料库特性而传播偏差。在这项工作中重现实验的脚本可在https://github.com/lcs2-iiitd/summarization_bias.git上获得。

Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph. In the past, it has been used to aggregate news, tweets, product reviews, etc. from various sources. Owing to no standard definition of the task, we encounter a plethora of datasets with varying levels of overlap and conflict between participating documents. There is also no standard regarding what constitutes summary information in MDS. Adding to the challenge is the fact that new systems report results on a set of chosen datasets, which might not correlate with their performance on the other datasets. In this paper, we study this heterogeneous task with the help of a few widely used MDS corpora and a suite of state-of-the-art models. We make an attempt to quantify the quality of summarization corpus and prescribe a list of points to consider while proposing a new MDS corpus. Next, we analyze the reason behind the absence of an MDS system which achieves superior performance across all corpora. We then observe the extent to which system metrics are influenced, and bias is propagated due to corpus properties. The scripts to reproduce the experiments in this work are available at https://github.com/LCS2-IIITD/summarization_bias.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题