shop-vrb：一个视觉推理基准，用于对象感知

论文标题

shop-vrb：一个视觉推理基准，用于对象感知

SHOP-VRB: A Visual Reasoning Benchmark for Object Perception

论文作者

Nazarczuk, Michal, Mikolajczyk, Krystian

论文摘要

在本文中，我们提出了一种方法和基准，用于在机器人应用中，特别是小物体抓握和操纵中的视觉推理。该方法和基准的重点是从视觉和文本数据中推断对象属性。它涉及小型家庭对象，其属性，功能，自然语言描述以及询问答案对以视觉推理查询以及相应的场景语义表示。我们还提出了一种生成合成数据的方法，该方法允许将基准扩展到其他对象或场景，并提出比现有数据集更具挑战性的评估协议。我们建议基于符号程序执行的推理系统。获得了视觉和文本输入的分散表示形式，并用于执行代表算法“推理过程”的符号程序。我们对提出的基准进行了一组实验，并将其与最先进的方法的结果进行比较。这些结果揭示了现有基准的缺点，这可能会导致对视觉推理系统的实际性能的误导性结论。

In this paper we present an approach and a benchmark for visual reasoning in robotics applications, in particular small object grasping and manipulation. The approach and benchmark are focused on inferring object properties from visual and text data. It concerns small household objects with their properties, functionality, natural language descriptions as well as question-answer pairs for visual reasoning queries along with their corresponding scene semantic representations. We also present a method for generating synthetic data which allows to extend the benchmark to other objects or scenes and propose an evaluation protocol that is more challenging than in the existing datasets. We propose a reasoning system based on symbolic program execution. A disentangled representation of the visual and textual inputs is obtained and used to execute symbolic programs that represent a 'reasoning process' of the algorithm. We perform a set of experiments on the proposed benchmark and compare to results for the state of the art methods. These results expose the shortcomings of the existing benchmarks that may lead to misleading conclusions on the actual performance of the visual reasoning systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题