具有特征投影和长度平衡损失的可读性评估的统一神经网络模型

论文标题

具有特征投影和长度平衡损失的可读性评估的统一神经网络模型

A Unified Neural Network Model for Readability Assessment with Feature Projection and Length-Balanced Loss

论文作者

Li, Wenbiao, Wang, Ziyang, Wu, Yunfang

论文摘要

为了进行可读性评估，传统方法主要采用具有数百种语言特征的机器学习分类器。尽管深度学习模型已成为几乎所有NLP任务的重要方法，但对于可读性评估而言，它的探索较少。在本文中，我们提出了一个基于BERT的模型，具有特征投影和长度平衡损失（BERT-FP-LBL），以进行可读性评估。特别是，我们提出了一种新的难度知识指导的半监督方法，以提取主题特征，以补充传统的语言特征。从语言特征中，我们采用投影过滤来提取正交特征来补充BERT表示。此外，我们设计了一个新的长度平衡损失，以处理数据的长度分布。我们的模型在两个英语基准数据集和一个中文教科书数据集上实现了最先进的表演，并且在一个英语数据集上还达到了99 \％的近乎完美精度。此外，我们提出的模型在一致性测试中与人类专家获得了可比的结果。

For readability assessment, traditional methods mainly employ machine learning classifiers with hundreds of linguistic features. Although the deep learning model has become the prominent approach for almost all NLP tasks, it is less explored for readability assessment. In this paper, we propose a BERT-based model with feature projection and length-balanced loss (BERT-FP-LBL) for readability assessment. Specially, we present a new difficulty knowledge guided semi-supervised method to extract topic features to complement the traditional linguistic features. From the linguistic features, we employ projection filtering to extract orthogonal features to supplement BERT representations. Furthermore, we design a new length-balanced loss to handle the greatly varying length distribution of data. Our model achieves state-of-the-art performances on two English benchmark datasets and one dataset of Chinese textbooks, and also achieves the near-perfect accuracy of 99\% on one English dataset. Moreover, our proposed model obtains comparable results with human experts in consistency test.

下载PDF全文

下载文献需遵守相关版权规定

论文标题