梵语中上下文敏感复合类型识别的新型多任务学习方法

论文标题

梵语中上下文敏感复合类型识别的新型多任务学习方法

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

论文作者

Sandhan, Jivnesh, Gupta, Ashish, Terdalkar, Hrishikesh, Sandhan, Tushar, Samanta, Suvendu, Behera, Laxmidhar, Goyal, Pawan

论文摘要

复合现象在梵语中无处不在。它有助于在表达思想时实现简洁，同时丰富了语言的词汇和结构形成。在这项工作中，我们专注于梵语复合类型识别（SACTI）任务，其中我们考虑了识别复合词组成部分之间语义关系的问题。早期的方法仅依赖于从组件获得的词汇信息，而忽略最关键的上下文和句法信息，对SACTI有用。但是，SACTI任务主要是由于化合物组件之间隐式编码的上下文敏感语义关系的挑战。因此，我们提出了一种新颖的多任务学习体系结构，该体系结构结合了上下文信息，并使用形态标记和依赖性解析作为两个辅助任务来丰富互补的句法信息。与最先进的系统相比，SACTI基准数据集上的实验显示了6.1分（准确性）和7.7点（F1得分）绝对增益。此外，我们的多语言实验证明了拟议的架构在英语和马拉地语中的功效。代码和数据集可在https://github.com/ashishishgupta2598/sacti上公开获得。

The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier approaches solely rely on the lexical information obtained from the components and ignore the most crucial contextual and syntactic information useful for SaCTI. However, the SaCTI task is challenging primarily due to the implicitly encoded context-sensitive semantic relation between the compound components. Thus, we propose a novel multi-task learning architecture which incorporates the contextual information and enriches the complementary syntactic information using morphological tagging and dependency parsing as two auxiliary tasks. Experiments on the benchmark datasets for SaCTI show 6.1 points (Accuracy) and 7.7 points (F1-score) absolute gain compared to the state-of-the-art system. Further, our multi-lingual experiments demonstrate the efficacy of the proposed architecture in English and Marathi languages.The code and datasets are publicly available at https://github.com/ashishgupta2598/SaCTI

下载PDF全文

下载文献需遵守相关版权规定

论文标题