论文标题

Leven:一个大规模的中国法律事件检测数据集

LEVEN: A Large-Scale Chinese Legal Event Detection Dataset

论文作者

Yao, Feng, Xiao, Chaojun, Wang, Xiaozhi, Liu, Zhiyuan, Hou, Lei, Tu, Cunchao, Li, Juanzi, Liu, Yun, Shen, Weixing, Sun, Maosong

论文摘要

认识到事实是做出判断的最基本步骤,因此检测法律文件中的事件对于法律案件分析任务很重要。但是,现有的法律事件检测(LED)数据集仅涉及不可思议的事件类型,并且有限的注释数据限制了LED方法及其下游应用程序的开发。为了减轻这些问题,我们向Leven提出了一个大规模的中国法律事件检测数据集,其中有8,116个法律文件和150,977个人类宣布的事件提到了108种活动类型。 Leven不仅涵盖了与费用相关的事件,还涵盖了一般事件,这对于法律案件理解至关重要,但在现有LED数据集中被忽略。据我们所知,Leven是最大的LED数据集,并且具有数十倍其他数据集,这将大大促进LED方法的培训和评估。广泛的实验结果表明LED具有挑战性,需要进一步的努力。此外,我们简单地利用法律事件作为附带信息来促进下游应用程序。该方法在低资源判断预测中的平均2.2点精度得到改善,而无监督的病例检索中的平均平均精度为1.5点,这表明LED的基础性。可以从https://github.com/thunlp/leven获得源代码和数据集。

Recognizing facts is the most fundamental step in making judgments, hence detecting events in the legal documents is important to legal case analysis tasks. However, existing Legal Event Detection (LED) datasets only concern incomprehensive event types and have limited annotated data, which restricts the development of LED methods and their downstream applications. To alleviate these issues, we present LEVEN a large-scale Chinese LEgal eVENt detection dataset, with 8,116 legal documents and 150,977 human-annotated event mentions in 108 event types. Not only charge-related events, LEVEN also covers general events, which are critical for legal case understanding but neglected in existing LED datasets. To our knowledge, LEVEN is the largest LED dataset and has dozens of times the data scale of others, which shall significantly promote the training and evaluation of LED methods. The results of extensive experiments indicate that LED is challenging and needs further effort. Moreover, we simply utilize legal events as side information to promote downstream applications. The method achieves improvements of average 2.2 points precision in low-resource judgment prediction, and 1.5 points mean average precision in unsupervised case retrieval, which suggests the fundamentality of LED. The source code and dataset can be obtained from https://github.com/thunlp/LEVEN.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源