论文标题

Banglawriting:多功能离线孟加拉手写数据集

BanglaWriting: A multi-purpose offline Bangla handwriting dataset

论文作者

Mridha, M. F., Ohi, Abu Quwsar, Ali, M. Ameer, Emon, Mazedul Islam, Kabir, Muhammad Mohsin

论文摘要

本文介绍了一个名为Bangrawriting的孟加拉手写数据集,其中包含260个不同个性和年龄的个人的单页手写。每个页面都包含每个单词边界的边界框以及写作的Unicode表示形式。该数据集总共包含21,234个单词和32,787个字符。此外,该数据集包括孟加拉词汇的5,470个独特单词。除了通常的单词外,数据集还包括261个可理解的覆盖和450个手写罢工和错误。所有边界盒和单词标签都是手动生成的。该数据集可用于复杂的光学字符/单词识别,作者识别,手写单词分割和单词生成。此外,该数据集适用于提取基于年龄和基于性别的笔迹变化。

This article presents a Bangla handwriting dataset named BanglaWriting that contains single-page handwritings of 260 individuals of different personalities and ages. Each page includes bounding-boxes that bounds each word, along with the unicode representation of the writing. This dataset contains 21,234 words and 32,787 characters in total. Moreover, this dataset includes 5,470 unique words of Bangla vocabulary. Apart from the usual words, the dataset comprises 261 comprehensible overwriting and 450 handwritten strikes and mistakes. All of the bounding-boxes and word labels are manually-generated. The dataset can be used for complex optical character/word recognition, writer identification, handwritten word segmentation, and word generation. Furthermore, this dataset is suitable for extracting age-based and gender-based variation of handwriting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源