部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Towards Arabic Sentence Simplification via Classification and Generative Approaches

论文作者

Khallaf, Nouran, Sharoff, Serge

论文摘要

本文提出了建立现代标准阿拉伯语（MSA）句子级简化系统的尝试。我们使用两种方法对句子简化进行了实验：（i）一种分类方法，导致使用阿拉伯语的词汇简化管道，这些管道使用阿拉伯语（一种预训练的上下文化模型）以及FastText Word嵌入的模型；（ii）一种生成方法，一种通过应用多语言文本到文本传输变压器MT5的SEQ2SEQ技术。我们通过将国际广受赞誉的阿拉伯小说“ Saaq al-Bambuu”的原始和简化句子保持一致，开发了我们的培训语料库。我们通过使用BERTSCORE评估度量标准将生成的简单句子与目标简单句子进行比较，评估这些方法的有效性。 MT5模型产生的简单句子通过Bertscore实现了P 0.72，R 0.68和F-1 0.70，而将阿拉伯语bert和FastText结合起来可实现p 0.97，r 0.97和F-1 0.97。此外，我们报告了这些实验的手动错误分析。 \ url {https://github.com/nouran-khallaf/lexical_simplification}

This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system. We experimented with sentence simplification using two approaches: (i) a classification approach leading to lexical simplification pipelines which use Arabic-BERT, a pre-trained contextualised model, as well as a model of fastText word embeddings; and (ii) a generative approach, a Seq2Seq technique by applying a multilingual Text-to-Text Transfer Transformer mT5. We developed our training corpus by aligning the original and simplified sentences from the internationally acclaimed Arabic novel "Saaq al-Bambuu". We evaluate effectiveness of these methods by comparing the generated simple sentences to the target simple sentences using the BERTScore evaluation metric. The simple sentences produced by the mT5 model achieve P 0.72, R 0.68 and F-1 0.70 via BERTScore, while, combining Arabic-BERT and fastText achieves P 0.97, R 0.97 and F-1 0.97. In addition, we report a manual error analysis for these experiments. \url{https://github.com/Nouran-Khallaf/Lexical_Simplification}

下载PDF全文

下载文献需遵守相关版权规定

论文标题