论文标题
对手写数字字符串识别的端到端方法的全面比较
A Comprehensive Comparison of End-to-End Approaches for Handwritten Digit String Recognition
论文作者
论文摘要
在过去的几十年中,针对手写数字字符串识别(HDSR)提出的大多数方法已诉诸于数字细分,这是由启发式方法主导的,从而对最终性能施加了很大的限制。它们中的很少有基于无分段的策略,每个像素列都有潜在的切割位置。最近,无细分策略为问题增加了另一种观点,从而导致了令人鼓舞的结果。但是,在处理大量接触数字时,这些策略仍然显示出一些局限性。为了弥合所得差距,在本文中,我们假设可以将一串数字作为一系列对象接近。因此,我们评估了解决HDSR问题的不同端到端方法,特别是在两个垂直方面:基于对象检测的垂直方面(例如,Yolo和Verinanet),以及基于序列到序列表示(CRNN)的垂直方向。这项工作的主要贡献在于它与对评估HDSR的五种基准的批判性分析进行了全面比较,包括用于评估HDSR的五个基准,包括针对ICFHR 2014对HDSR的ICFHR 2014竞赛提出的挑战性接触对数据集,NIST SD19和两个现实世界中的数据集(CAR和CVL)。我们的结果表明,YOLO模型与无分段模型进行了有利的比较,具有较短的管道,可以最大程度地减少基于启发式的模型的存在。它分别在NIST-SD19,CAR和CVL数据集上达到了97%,96%和84%的识别率。
Over the last decades, most approaches proposed for handwritten digit string recognition (HDSR) have resorted to digit segmentation, which is dominated by heuristics, thereby imposing substantial constraints on the final performance. Few of them have been based on segmentation-free strategies where each pixel column has a potential cut location. Recently, segmentation-free strategies has added another perspective to the problem, leading to promising results. However, these strategies still show some limitations when dealing with a large number of touching digits. To bridge the resulting gap, in this paper, we hypothesize that a string of digits can be approached as a sequence of objects. We thus evaluate different end-to-end approaches to solve the HDSR problem, particularly in two verticals: those based on object-detection (e.g., Yolo and RetinaNet) and those based on sequence-to-sequence representation (CRNN). The main contribution of this work lies in its provision of a comprehensive comparison with a critical analysis of the above mentioned strategies on five benchmarks commonly used to assess HDSR, including the challenging Touching Pair dataset, NIST SD19, and two real-world datasets (CAR and CVL) proposed for the ICFHR 2014 competition on HDSR. Our results show that the Yolo model compares favorably against segmentation-free models with the advantage of having a shorter pipeline that minimizes the presence of heuristics-based models. It achieved a 97%, 96%, and 84% recognition rate on the NIST-SD19, CAR, and CVL datasets, respectively.