论文标题

部分可观测时空混沌系统的无模型预测

Towards Arabic Sentence Simplification via Classification and Generative Approaches

论文作者

Khallaf, Nouran, Sharoff, Serge

论文摘要

本文提出了建立现代标准阿拉伯语(MSA)句子级简化系统的尝试。我们使用两种方法对句子简化进行了实验:(i)一种分类方法,导致使用阿拉伯语的词汇简化管道,这些管道使用阿拉伯语(一种预训练的上下文化模型)以及FastText Word嵌入的模型; (ii)一种生成方法,一种通过应用多语言文本到文本传输变压器MT5的SEQ2SEQ技术。我们通过将国际广受赞誉的阿拉伯小说“ Saaq al-Bambuu”的原始和简化句子保持一致,开发​​了我们的培训语料库。我们通过使用BERTSCORE评估度量标准将生成的简单句子与目标简单句子进行比较,评估这些方法的有效性。 MT5模型产生的简单句子通过Bertscore实现了P 0.72,R 0.68和F-1 0.70,而将阿拉伯语bert和FastText结合起来可实现p 0.97,r 0.97和F-1 0.97。此外,我们报告了这些实验的手动错误分析。 \ url {https://github.com/nouran-khallaf/lexical_simplification}

This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system. We experimented with sentence simplification using two approaches: (i) a classification approach leading to lexical simplification pipelines which use Arabic-BERT, a pre-trained contextualised model, as well as a model of fastText word embeddings; and (ii) a generative approach, a Seq2Seq technique by applying a multilingual Text-to-Text Transfer Transformer mT5. We developed our training corpus by aligning the original and simplified sentences from the internationally acclaimed Arabic novel "Saaq al-Bambuu". We evaluate effectiveness of these methods by comparing the generated simple sentences to the target simple sentences using the BERTScore evaluation metric. The simple sentences produced by the mT5 model achieve P 0.72, R 0.68 and F-1 0.70 via BERTScore, while, combining Arabic-BERT and fastText achieves P 0.97, R 0.97 and F-1 0.97. In addition, we report a manual error analysis for these experiments. \url{https://github.com/Nouran-Khallaf/Lexical_Simplification}

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源