论文标题
层次多任务学习,用子词上下文嵌入具有丰富形态的语言的上下文嵌入
Hierarchical Multi Task Learning with Subword Contextual Embeddings for Languages with Rich Morphology
论文作者
论文摘要
形态学信息对于自然语言处理(NLP)的许多序列标记任务很重要。但是,现有方法在很大程度上依赖手动注释或外部软件来捕获此信息。在这项研究中,我们建议使用子词上下文嵌入以捕获具有丰富形态的语言的形态信息。此外,据我们所知,我们将这些嵌入在层次的多任务设置中。根据依赖性解析(DEP)和指定的实体识别(NER)任务进行了评估,这些任务被证明从形态学信息中受益匪浅,我们的最终模型在两种任务上都优于土耳其语言的两个任务的先前最先进的模型。此外,在DEP和NER任务的同一设置中,我们比以前提出的多任务学习者的净改进为18.86%和4.61%的F-1。五个不同MTL设置的经验结果表明,结合子词上下文嵌入的嵌入式为这两个任务带来了重大改进。此外,我们观察到多任务学习一致地改善了DEP组件的性能。
Morphological information is important for many sequence labeling tasks in Natural Language Processing (NLP). Yet, existing approaches rely heavily on manual annotations or external software to capture this information. In this study, we propose using subword contextual embeddings to capture the morphological information for languages with rich morphology. In addition, we incorporate these embeddings in a hierarchical multi-task setting which is not employed before, to the best of our knowledge. Evaluated on Dependency Parsing (DEP) and Named Entity Recognition (NER) tasks, which are shown to benefit greatly from morphological information, our final model outperforms previous state-of-the-art models on both tasks for the Turkish language. Besides, we show a net improvement of 18.86% and 4.61% F-1 over the previously proposed multi-task learner in the same setting for the DEP and the NER tasks, respectively. Empirical results for five different MTL settings show that incorporating subword contextual embeddings brings significant improvements for both tasks. In addition, we observed that multi-task learning consistently improves the performance of the DEP component.