论文标题
使用Paninian语法自动提取孟加拉语根动词
Automatic Extraction of Bengali Root Verbs using Paninian Grammar
论文作者
论文摘要
在这项研究工作中,我们提出了一种基于监督学习方法的算法,以使用Panini [1]在Ashtadhyayi中提出的语法规则提取孟加拉语动词的根部形式。 This methodology can be applied for the languages which are derived from Sanskrit. The proposed system has been developed based on tense, person and morphological inflections of the verbs to find their root forms.这项工作已分为两个阶段:首先,将动词的表面水平形式或弯曲形式分类为一定数量的类似时态和人组。 For this task, a standard pattern, available in Bengali language has been used. Next, a set of rules have been applied to extract the root form from the surface level forms of a verb. The system has been tested on 10000 verbs collected from the Bengali text corpus developed in the TDIL project of the Govt. of India. The accuracy of the output has been achieved 98% which is verified by a linguistic expert. root动词标识是语义搜索,多句搜索查询处理,了解语言含义,单词sense的歧义,句子的分类等等的关键步骤。
In this research work, we have proposed an algorithm based on supervised learning methodology to extract the root forms of the Bengali verbs using the grammatical rules proposed by Panini [1] in Ashtadhyayi. This methodology can be applied for the languages which are derived from Sanskrit. The proposed system has been developed based on tense, person and morphological inflections of the verbs to find their root forms. The work has been executed in two phases: first, the surface level forms or inflected forms of the verbs have been classified into a certain number of groups of similar tense and person. For this task, a standard pattern, available in Bengali language has been used. Next, a set of rules have been applied to extract the root form from the surface level forms of a verb. The system has been tested on 10000 verbs collected from the Bengali text corpus developed in the TDIL project of the Govt. of India. The accuracy of the output has been achieved 98% which is verified by a linguistic expert. Root verb identification is a key step in semantic searching, multi-sentence search query processing, understanding the meaning of a language, disambiguation of word sense, classification of the sentences etc.