论文标题
因果伯特:文本中表达的事件之间因果关系检测的语言模型
Causal BERT : Language models for causality detection between events expressed in text
论文作者
论文摘要
事件之间的因果关系是一项关键的自然语言处理任务,在许多领域,包括医疗保健,业务风险管理和金融方面都有帮助。经过仔细检查,人们可以以正式文档的形式或由Twitter等社交媒体产生的内容找到大量的文本内容,该内容致力于在现实世界中传达和探索各种因果关系。认识到自然语言事件之间的这些“原因效应”关系仍然是一个挑战,仅仅是因为它通常被隐式地表达。隐含因果关系很难通过文献中使用的大多数技术来检测,而且有时也可以被认为是模棱两可的或模糊的。同样,尽管确实存在众所周知的数据集用于此问题,但其中的示例在其描述的因果关系的范围和复杂性上受到限制,尤其是在与隐式关系有关的情况下。大多数当代方法要么基于词典语义模式匹配,要么是特征驱动的监督方法。因此,正如预期的那样,这些方法更旨在处理明确的因果关系,从而导致隐性关系的覆盖率有限,并且很难推广。在本文中,我们调查了语言模型在自然语言文本中使用句子上下文与事件信息结合的事件之间的因果关系的能力,并通过利用蒙版的事件上下文,并使用域内和室外数据分布。我们提出的方法在三种不同的数据分布中实现了最先进的性能,并且可以利用因果图和/或从非结构化文本中构建一系列事件。
Causality understanding between events is a critical natural language processing task that is helpful in many areas, including health care, business risk management and finance. On close examination, one can find a huge amount of textual content both in the form of formal documents or in content arising from social media like Twitter, dedicated to communicating and exploring various types of causality in the real world. Recognizing these "Cause-Effect" relationships between natural language events continues to remain a challenge simply because it is often expressed implicitly. Implicit causality is hard to detect through most of the techniques employed in literature and can also, at times be perceived as ambiguous or vague. Also, although well-known datasets do exist for this problem, the examples in them are limited in the range and complexity of the causal relationships they depict especially when related to implicit relationships. Most of the contemporary methods are either based on lexico-semantic pattern matching or are feature-driven supervised methods. Therefore, as expected these methods are more geared towards handling explicit causal relationships leading to limited coverage for implicit relationships and are hard to generalize. In this paper, we investigate the language model's capabilities for causal association among events expressed in natural language text using sentence context combined with event information, and by leveraging masked event context with in-domain and out-of-domain data distribution. Our proposed methods achieve the state-of-art performance in three different data distributions and can be leveraged for extraction of a causal diagram and/or building a chain of events from unstructured text.