分类器组合方法的问题分类孟加拉语问题答案系统

论文标题

分类器组合方法的问题分类孟加拉语问题答案系统

Classifier Combination Approach for Question Classification for Bengali Question Answering System

论文作者

Banerjee, Somnath, Naskar, Sudip Kumar, Rosso, Paolo, Bandyopadhyay, Sivaji

论文摘要

问题分类（QC）是自动问答系统的主要组成部分。此处介绍的工作表明，多个模型的组合比孟加拉语中的问题分类任务获得的分类性能更好。我们已经利用了最先进的多个模型组合技术，即集合，堆叠和投票，以提高质量控制的准确性。孟加拉语问题的词汇，句法和语义特征用于四个著名的分类器，即na \“ıve贝叶斯，内核Na \”ıve贝叶斯，规则感应和决策树，它们是我们的基础学习者。单层问题类别分类法具有8个粗粒类别，通过添加69个细粒类别扩展到两层分类。我们对单层和两层分类法进行了实验。实验结果证实，分类器组合方法的表现优于单个分类器分类方法，对于粗粒度的问题类别而言，分类器方法的表现为4.02％。总体而言，堆叠方法可为细粒度分类带来最佳结果，并达到87.79％的准确性。此处介绍的方法可以在其他印度 - 雅利安语或指示语言中用于开发一个问答系统。

Question classification (QC) is a prime constituent of automated question answering system. The work presented here demonstrates that the combination of multiple models achieve better classification performance than those obtained with existing individual models for the question classification task in Bengali. We have exploited state-of-the-art multiple model combination techniques, i.e., ensemble, stacking and voting, to increase QC accuracy. Lexical, syntactic and semantic features of Bengali questions are used for four well-known classifiers, namely Na\"ıve Bayes, kernel Na\"ıve Bayes, Rule Induction, and Decision Tree, which serve as our base learners. Single-layer question-class taxonomy with 8 coarse-grained classes is extended to two-layer taxonomy by adding 69 fine-grained classes. We carried out the experiments both on single-layer and two-layer taxonomies. Experimental results confirmed that classifier combination approaches outperform single classifier classification approaches by 4.02% for coarse-grained question classes. Overall, the stacking approach produces the best results for fine-grained classification and achieves 87.79% of accuracy. The approach presented here could be used in other Indo-Aryan or Indic languages to develop a question answering system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题