论文标题

从文本行的手写脚本标识

Handwritten Script Identification from Text Lines

论文作者

Singh, Pawan Kumar, Chatterjee, Iman, Sarkar, Ram, Nasipuri, Mita

论文摘要

在使用12个不同官方脚本的印度等多语言国家中,手写脚本的自动标识促进了许多重要的应用程序,例如自动转录多语言文档,搜索包含特定脚本的Web/Digital Archives上的文档,以及在多语言环境中选择特定的光学字符识别(OCR)系统。在本文中,我们提出了一种强大的方法,用于从文本级别的手写文档中识别脚本。识别基于使用链代码直方图(CCH)和离散傅立叶变换(DFT)提取的功能。提出的方法是在用七个指示脚本编写的800个手写文本线上实验的,即古吉拉特语,卡纳达语,马拉雅拉姆语,奥里亚,泰米尔语,泰米尔语,泰卢固语,乌尔都语,乌尔都语以及罗马文字,并产生了使用Support vector Vector Machine(SVM Machine(svm Machine(SVM))的平均识别率为95.14%。

In a multilingual country like India where 12 different official scripts are in use, automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents, searching for documents on the web/digital archives containing a particular script and for the selection of script specific Optical Character Recognition (OCR) system in a multilingual environment. In this paper, we propose a robust method towards identifying scripts from the handwritten documents at text line-level. The recognition is based upon features extracted using Chain Code Histogram (CCH) and Discrete Fourier Transform (DFT). The proposed method is experimented on 800 handwritten text lines written in seven Indic scripts namely, Gujarati, Kannada, Malayalam, Oriya, Tamil, Telugu, Urdu along with Roman script and yielded an average identification rate of 95.14% using Support Vector Machine (SVM) classifier.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源