论文标题
备用3D:用于三视线线图上空间推理的数据集
SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings
论文作者
论文摘要
空间推理是人类智力的重要组成部分。我们可以想象3D对象的形状和有关它们的空间关系的理由,仅通过查看其2D的三视图线图,具有不同的能力。可以训练深网以执行空间推理任务吗?我们如何衡量他们的“空间智能”?要回答这些问题,我们提出了备用3D数据集。基于认知科学和心理计量学,备用3D包含三种类型的2D-3D推理任务,视图一致性,摄像头姿势和形状产生,并且难度越来越大。然后,我们设计了一种方法,可以自动为每个任务带来地面真相答案,从而自动产生许多挑战性的问题。它们用于使用Resnet等最先进的体系结构来培训我们的基线模型的监督。我们的实验表明,尽管卷积网络在许多视觉学习任务中都达到了超人的表现,但它们在备用3D任务上的空间推理性能要么低于平均人类绩效,甚至低于随机猜测。我们希望备用3D能够刺激新的问题公式和网络设计,以实现空间推理,以通过2D传感器在3D世界中有效地在3D世界中运行。数据集和代码可在https://ai4ce.github.io/spare3d上找到。
Spatial reasoning is an important component of human intelligence. We can imagine the shapes of 3D objects and reason about their spatial relations by merely looking at their three-view line drawings in 2D, with different levels of competence. Can deep networks be trained to perform spatial reasoning tasks? How can we measure their "spatial intelligence"? To answer these questions, we present the SPARE3D dataset. Based on cognitive science and psychometrics, SPARE3D contains three types of 2D-3D reasoning tasks on view consistency, camera pose, and shape generation, with increasing difficulty. We then design a method to automatically generate a large number of challenging questions with ground truth answers for each task. They are used to provide supervision for training our baseline models using state-of-the-art architectures like ResNet. Our experiments show that although convolutional networks have achieved superhuman performance in many visual learning tasks, their spatial reasoning performance on SPARE3D tasks is either lower than average human performance or even close to random guesses. We hope SPARE3D can stimulate new problem formulations and network designs for spatial reasoning to empower intelligent robots to operate effectively in the 3D world via 2D sensors. The dataset and code are available at https://ai4ce.github.io/SPARE3D.