培训协议事项：通过培训协议搜索准确的场景文本识别

论文标题

培训协议事项：通过培训协议搜索准确的场景文本识别

Training Protocol Matters: Towards Accurate Scene Text Recognition via Training Protocol Searching

论文作者

Chu, Xiaojie, Wang, Yongtao, Shen, Chunhua, Chen, Jingdong, Chu, Wei

论文摘要

在深度学习时代，场景文本识别（STR）的发展主要集中在STR模型的新型体系结构上。但是，培训方案（即，参与STR模型培训的超参数的设置），该协议在成功训练良好的STR模型中起着同样重要的作用，在现场文本识别方面尚未探索。在这项工作中，我们试图通过搜索最佳培训协议来提高现有STR模型的准确性。具体而言，我们基于新设计的搜索空间以及使用进化优化和代理任务的有效搜索算法开发培训协议搜索算法。实验结果表明，我们的搜索培训方案可以提高主流STR模型的识别准确性2.7％〜3.9％。特别是，通过搜索训练协议，TRBA-NET的准确性比最先进的STR模型（即EFIFSTR）高2.1％，而推理速度分别在CPU和GPU上的速度分别为2.3倍和3.7倍。进行了广泛的实验，以证明所提出的方法的有效性以及我们搜索方法找到的训练方案的概括能力。代码可在https://github.com/vdigpku/str_tpsearch上找到。

The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models. However, training protocol (i.e., settings of the hyper-parameters involved in the training of STR models), which plays an equally important role in successfully training a good STR model, is under-explored for scene text recognition. In this work, we attempt to improve the accuracy of existing STR models by searching for optimal training protocol. Specifically, we develop a training protocol search algorithm, based on a newly designed search space and an efficient search algorithm using evolutionary optimization and proxy tasks. Experimental results show that our searched training protocol can improve the recognition accuracy of mainstream STR models by 2.7%~3.9%. In particular, with the searched training protocol, TRBA-Net achieves 2.1% higher accuracy than the state-of-the-art STR model (i.e., EFIFSTR), while the inference speed is 2.3x and 3.7x faster on CPU and GPU respectively. Extensive experiments are conducted to demonstrate the effectiveness of the proposed method and the generalization ability of the training protocol found by our search method. Code is available at https://github.com/VDIGPKU/STR_TPSearch.

下载PDF全文

下载文献需遵守相关版权规定

论文标题