论文标题
saperouge:用于使用和开发摘要评估指标的开源库
SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics
论文作者
论文摘要
我们介绍了一个开源库,用于使用和开发摘要评估指标。 SCAREROUGE消除了研究人员使用或开发指标时面临的许多障碍:(1)图书馆围绕现有评估指标的正式实施提供了Python包装器,因此它们具有共同的,易于使用的界面; (2)它提供了评估图书馆中的任何指标与人类宣传的判断相关联,因此不需要为新的评估指标编写其他代码; (3)它包括用于加载包含人类判断的数据集的脚本,以便可以轻松地用于评估。这项工作描述了库的设计,包括核心度量接口,用于评估汇总模型和指标的命令行API以及用于加载和重新格式化公开可用数据集的脚本。祭祀的发展正在进行中,并开放了社区的贡献。
We present SacreROUGE, an open-source library for using and developing summarization evaluation metrics. SacreROUGE removes many obstacles that researchers face when using or developing metrics: (1) The library provides Python wrappers around the official implementations of existing evaluation metrics so they share a common, easy-to-use interface; (2) it provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments, so no additional code needs to be written for a new evaluation metric; and (3) it includes scripts for loading datasets that contain human judgments so they can easily be used for evaluation. This work describes the design of the library, including the core Metric interface, the command-line API for evaluating summarization models and metrics, and the scripts to load and reformat publicly available datasets. The development of SacreROUGE is ongoing and open to contributions from the community.