论文标题
大型话语是树库,具有可扩展的遥远情感监督的结构和核性
MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision
论文作者
论文摘要
缺乏大型和多样化的话语树库阻碍了数据驱动的方法的应用,例如深度学习,在第一个式的话语解析中。在这项工作中,我们提出了一种新颖的可扩展方法,可自动使用遥远的宣布数据集中的遥远监督,创建和发布MEGA-DT,这是一种新的大规模话语,是一种新的大规模话语宣布的语料库。我们的方法通过依靠有效的启发式光束搜索策略来生成融合结构和核性,以融入任意长度的文档,并随机组件扩展。在多个数据集上进行的实验表明,与接受过人类宣布的话语语料库培训的解析器相比,对我们的Mega-DT Treebank进行了训练的话语解析器可提供有希望的域间绩效。
The lack of large and diverse discourse treebanks hinders the application of data-driven approaches, such as deep-learning, to RST-style discourse parsing. In this work, we present a novel scalable methodology to automatically generate discourse treebanks using distant supervision from sentiment-annotated datasets, creating and publishing MEGA-DT, a new large-scale discourse-annotated corpus. Our approach generates discourse trees incorporating structure and nuclearity for documents of arbitrary length by relying on an efficient heuristic beam-search strategy, extended with a stochastic component. Experiments on multiple datasets indicate that a discourse parser trained on our MEGA-DT treebank delivers promising inter-domain performance gains when compared to parsers trained on human-annotated discourse corpora.