论文标题
骰子:具有生成模型的数据有效临床事件提取
DICE: Data-Efficient Clinical Event Extraction with Generative Models
论文作者
论文摘要
临床领域的事件提取是一个探索较少的研究领域。缺乏培训数据以及具有模糊实体边界的大量领域特定术语使该任务特别具有挑战性。在本文中,我们介绍了DICE,这是一种用于临床事件提取的强大而数据效率的生成模型。骰子框架事件提取作为有条件的生成问题,并引入了对比度学习目标,以准确决定生物医学提及的界限。 DICE还与事件提取任务共同训练辅助提及的标识任务,以更好地识别实体提及界限,并进一步介绍特殊标记,以将已确定的实体提及为触发和参数候选人,以实现其各自的任务。为了基准临床事件提取,我们根据现有的临床信息提取数据集MacCrobat组成了MacCrobat-EE,这是第一个带有参数注释的临床事件提取数据集。我们的实验表明,临床和新闻领域事件提取的骰子骰子的最新表现,尤其是在低数据设置下。
Event extraction for the clinical domain is an under-explored research area. The lack of training data along with the high volume of domain-specific terminologies with vague entity boundaries makes the task especially challenging. In this paper, we introduce DICE, a robust and data-efficient generative model for clinical event extraction. DICE frames event extraction as a conditional generation problem and introduces a contrastive learning objective to accurately decide the boundaries of biomedical mentions. DICE also trains an auxiliary mention identification task jointly with event extraction tasks to better identify entity mention boundaries, and further introduces special markers to incorporate identified entity mentions as trigger and argument candidates for their respective tasks. To benchmark clinical event extraction, we compose MACCROBAT-EE, the first clinical event extraction dataset with argument annotation, based on an existing clinical information extraction dataset MACCROBAT. Our experiments demonstrate state-of-the-art performances of DICE for clinical and news domain event extraction, especially under low data settings.