论文标题
幻灯片:用于教育图像分类的数据集
SlideImages: A Dataset for Educational Image Classification
论文作者
论文摘要
在过去的几年中,卷积神经网络(CNN)在计算机视觉任务中取得了令人印象深刻的结果,但是这些任务主要集中在具有自然场景内容的照片上。此外,通常使用非传感器派生的图像,例如插图,数据可视化,图形等,以传达复杂的信息或探索大型数据集。但是,这种图像在计算机视觉中很少受到关注。 CNN和类似技术使用大量培训数据。当前,由于缺乏大量的教育图像数据数据集,许多文档分析系统部分受到场景图像的培训。在本文中,我们解决了这个问题并呈现SlideImages,这是用于对教育插图进行分类的任务的数据集。滑动图包含从各种来源收集的培训数据,例如Wikimedia Commons和AI2D数据集,以及从教育幻灯片收集的测试数据。我们已将所有实际的教育图像保留为测试数据集,以确保使用此数据集的方法可以很好地推广到新的教育图像以及潜在的其他领域。此外,我们使用标准的深神经架构提出了基线系统,并讨论应对有限培训数据的挑战。
In the past few years, convolutional neural networks (CNNs) have achieved impressive results in computer vision tasks, which however mainly focus on photos with natural scene content. Besides, non-sensor derived images such as illustrations, data visualizations, figures, etc. are typically used to convey complex information or to explore large datasets. However, this kind of images has received little attention in computer vision. CNNs and similar techniques use large volumes of training data. Currently, many document analysis systems are trained in part on scene images due to the lack of large datasets of educational image data. In this paper, we address this issue and present SlideImages, a dataset for the task of classifying educational illustrations. SlideImages contains training data collected from various sources, e.g., Wikimedia Commons and the AI2D dataset, and test data collected from educational slides. We have reserved all the actual educational images as a test dataset in order to ensure that the approaches using this dataset generalize well to new educational images, and potentially other domains. Furthermore, we present a baseline system using a standard deep neural architecture and discuss dealing with the challenge of limited training data.