RoadText-1K：用于驾驶视频的文本检测和识别数据集

论文标题

RoadText-1K：用于驾驶视频的文本检测和识别数据集

RoadText-1K: Text Detection & Recognition Dataset for Driving Videos

论文作者

Reddy, Sangeeth, Mathew, Minesh, Gomez, Lluis, Rusinol, Marcal, Karatzas., Dimosthenis, Jawahar, C. V.

论文摘要

感知文本对于了解室外场景的语义至关重要，因此是建立智能系统以进行驾驶员帮助和自动驾驶的关键要求。大多数用于文本检测和识别的现有数据集都包含静止图像，并且大部分都牢记文本。本文介绍了一个新的“ RoadText-1k”数据集，用于驾驶视频中的文本。该数据集比现有的视频中文本最大的数据集大20倍。我们的数据集包含1000个视频驾驶片段，而没有对文本的任何偏见，并在每个框架中都带有文本边界框和转录的注释。在新数据集中评估了用于文本检测，识别和跟踪的最新方法，结果表明与现有数据集相比，无约束的驾驶视频中的挑战。这表明RoadText-1K适合于阅读系统的研究和开发，足以将其纳入更复杂的下游任务，例如驾驶员援助和自动驾驶。该数据集可在http://cvit.iiit.ac.in/research/project/projects/cvit-projects/roadtext-1k中找到

Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new "RoadText-1K" dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtext-1k

下载PDF全文

下载文献需遵守相关版权规定

论文标题