论文标题
Bspell:CNN混合Bert Bangla Spell Checker
BSpell: A CNN-Blended BERT Based Bangla Spell Checker
论文作者
论文摘要
Bangla键入主要是使用英语键盘进行的,并且由于存在化合物和类似明显的字母,因此可能会非常错误。拼写错误的单词的拼写校正需要了解单词键入模式以及用法一词的上下文。本文提出了一种名为Bspell的专业BERT模型,该模型针对句子级别的单词校正。 Bspell包含一个可训练的CNN子模型,名为Semanticnet以及专门的辅助损失。这使得Bspell在存在拼写错误的情况下专门研究高度易转的孟加拉词汇。此外,已经为BSPELL提出了一种混合训练方案,该方案结合了单词级别和字符水平掩蔽。对两个孟加拉和一个印地语拼写校正数据集的比较显示了我们提出的方法的优越性。 Bspell可作为bangla拼写检查工具通过GitHub提供:https://github.com/hasiburshanto/bangla-spell-checker
Bangla typing is mostly performed using English keyboard and can be highly erroneous due to the presence of compound and similarly pronounced letters. Spelling correction of a misspelled word requires understanding of word typing pattern as well as the context of the word usage. A specialized BERT model named BSpell has been proposed in this paper targeted towards word for word correction in sentence level. BSpell contains an end-to-end trainable CNN sub-model named SemanticNet along with specialized auxiliary loss. This allows BSpell to specialize in highly inflected Bangla vocabulary in the presence of spelling errors. Furthermore, a hybrid pretraining scheme has been proposed for BSpell that combines word level and character level masking. Comparison on two Bangla and one Hindi spelling correction dataset shows the superiority of our proposed approach. BSpell is available as a Bangla spell checking tool via GitHub: https://github.com/Hasiburshanto/Bangla-Spell-Checker