论文标题

向实例级解析器选择依赖性解析器的跨语性转移

Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers

论文作者

Litschko, Robert, Vulić, Ivan, Agić, Željko, Glavaš, Goran

论文摘要

当前的跨语言解析器转移方法重点是预测全球低资源目标语言的最佳解析器,即“在Treebank级别”。在这项工作中,我们提出并主张一种新颖的跨语性转移范式:实例级解析器选择(ILP),并提出了一项概念证明的研究研究,该研究的重点是在drexicalized解析器转移框架中的实例级选择。我们从经验观察开始,即不同的源解析器是目标语言中不同通用POS序列的最佳选择。然后,我们建议在实例级别预测最佳解析器。为此,我们根据变压器体系结构训练一个监督的回归模型,以预测单个POS序列的解析器精度。我们将ILP与两个强大的单最佳解析器选择基准(SBP)进行了比较:(1)一个模型,该模型比较了源和目标语言(KL)和(2)在编码语言属性(L2V)语法属性之间选择源的模型(kl)和(2)选择源的模型。我们广泛评估的结果,耦合了42个源解析器和20种不同的低资源测试语言,表明ILPS分别在13/20和14/20测试语言上优于KL和L2V。此外,我们表明,通过使用我们实例级别模型的预测汇总,通过预测最佳解析器“在Treebank级别”(SBPS),我们在17/20和16/20测试语言上胜过相同的基准。

Current methods of cross-lingual parser transfer focus on predicting the best parser for a low-resource target language globally, that is, "at treebank level". In this work, we propose and argue for a novel cross-lingual transfer paradigm: instance-level parser selection (ILPS), and present a proof-of-concept study focused on instance-level selection in the framework of delexicalized parser transfer. We start from an empirical observation that different source parsers are the best choice for different Universal POS sequences in the target language. We then propose to predict the best parser at the instance level. To this end, we train a supervised regression model, based on the Transformer architecture, to predict parser accuracies for individual POS-sequences. We compare ILPS against two strong single-best parser selection baselines (SBPS): (1) a model that compares POS n-gram distributions between the source and target languages (KL) and (2) a model that selects the source based on the similarity between manually created language vectors encoding syntactic properties of languages (L2V). The results from our extensive evaluation, coupling 42 source parsers and 20 diverse low-resource test languages, show that ILPS outperforms KL and L2V on 13/20 and 14/20 test languages, respectively. Further, we show that by predicting the best parser "at the treebank level" (SBPS), using the aggregation of predictions from our instance-level model, we outperform the same baselines on 17/20 and 16/20 test languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源