论文标题
话语关系信号检测的神经方法
A Neural Approach to Discourse Relation Signal Detection
论文作者
论文摘要
先前以数据驱动的工作调查了话语关系信号的类型和分布,包括“但是”或“诸如“结果”之类的话语标志物,诸如“结果”或“结果”之类的短语都集中在每个话语关系中的信号词的相对频率上。这种方法不允许我们量化一个尺度上的信号实例的信号强度(例如,或多或少的与话语相关的“和'”),评估信号的歧义性分布,或者识别在上下文中涉及话语的关系识别的单词(“反标志”或“分散分散者”)。在本文中,我们提出了一种数据驱动的方法,可以使用远距离监督的神经网络进行信号检测,并开发指标Delta S(或“ Delta-Softmax”),以量化信号强度。在-1和1之间,依靠上下文化词嵌入中的最新进展,该指标代表每个单词对在上下文中特定情况下关系的可识别性的正面或负面贡献。基于使用修辞结构理论和信号类型注释的英语语料库,我们的分析研究了指标的可靠性,与人类判断重叠的地方以及与人类判断重叠的地方以及对神经模型可能需要更好地在自动话语之间相关性相关性分类的特征的含义。
Previous data-driven work investigating the types and distributions of discourse relation signals, including discourse markers such as 'however' or phrases such as 'as a result' has focused on the relative frequencies of signal words within and outside text from each discourse relation. Such approaches do not allow us to quantify the signaling strength of individual instances of a signal on a scale (e.g. more or less discourse-relevant instances of 'and'), to assess the distribution of ambiguity for signals, or to identify words that hinder discourse relation identification in context ('anti-signals' or 'distractors'). In this paper we present a data-driven approach to signal detection using a distantly supervised neural network and develop a metric, Delta s (or 'delta-softmax'), to quantify signaling strength. Ranging between -1 and 1 and relying on recent advances in contextualized words embeddings, the metric represents each word's positive or negative contribution to the identifiability of a relation in specific instances in context. Based on an English corpus annotated for discourse relations using Rhetorical Structure Theory and signal type annotations anchored to specific tokens, our analysis examines the reliability of the metric, the places where it overlaps with and differs from human judgments, and the implications for identifying features that neural models may need in order to perform better on automatic discourse relation classification.