野生场景图像中的比例不变多取向的文本检测

论文标题

野生场景图像中的比例不变多取向的文本检测

Scale-Invariant Multi-Oriented Text Detection in Wild Scene Images

论文作者

Dasgupta, Kinjal, Das, Sudip, Bhattacharya, Ujjwal

论文摘要

自动检测野外场景文本是一个具有挑战性的问题，尤其是由于处理的困难（i）百分比的阻塞，（（ii）广泛不同的尺度和方向，（iii）本文中的图像质量中的严重降级等，我们提出了一个完全卷积的神经网络架构，包括一个由新颖的功能块（FRB）组成的，该功能块（FRB）由功能效率（有效）组成。拟议的网络已使用课程学习对图像样本和逐渐像素的困难进行培训。它能够检测出不同的量表和方向的文本，从而模糊了多种可能的来源，不均匀的照明以及部分遮挡不同百分比的部分。在各种基准样本数据库（包括ICDAR 2015，ICDAR 2017 MLT，COCO-TEXT和MSRA-TD500）（包括ICDAR 2015）（包括ICDAR 2015）中，提出框架的文本检测性能可显着改善各自的最先进结果。拟议体系结构的源代码将在GitHub提供。

Automatic detection of scene texts in the wild is a challenging problem, particularly due to the difficulties in handling (i) occlusions of varying percentages, (ii) widely different scales and orientations, (iii) severe degradations in the image quality etc. In this article, we propose a fully convolutional neural network architecture consisting of a novel Feature Representation Block (FRB) capable of efficient abstraction of information. The proposed network has been trained using curriculum learning with respect to difficulties in image samples and gradual pixel-wise blurring. It is capable of detecting texts of different scales and orientations suffered by blurring from multiple possible sources, non-uniform illumination as well as partial occlusions of varying percentages. Text detection performance of the proposed framework on various benchmark sample databases including ICDAR 2015, ICDAR 2017 MLT, COCO-Text and MSRA-TD500 improves respective state-of-the-art results significantly. Source code of the proposed architecture will be made available at github.

下载PDF全文

下载文献需遵守相关版权规定

论文标题