动态数据驱动的方法，用于解释场景理解

论文标题

动态数据驱动的方法，用于解释场景理解

A Dynamic Data Driven Approach for Explainable Scene Understanding

论文作者

Daniels, Zachary A, Metaxas, Dimitris

论文摘要

场景理解是计算机视觉领域的一个重要主题，并通过应用于各种领域的应用来说明计算挑战，包括遥感，监视，智能农业，机器人，自动驾驶和智能城市。我们考虑对场景的主动解释驱动的理解和分类。假设使用一个或多个传感器的代理放置在未知的环境中，并且基于其感觉输入，代理需要将某些标签分配给感知的场景。代理可以调整其传感器以捕获有关场景的其他详细信息，但是与传感器操纵相关的成本，因此，对于代理商而言，以快速有效的方式了解场景很重要。同样重要的是，代理商不仅要了解场景的全球状态（例如，场景的类别或场景中发生的重大事件），而且还要了解场景的特征/属性，这些场景的特征/属性支持对场景全球状态做出的决策和预测。最后，当代理遇到未知场景类别时，它必须能够拒绝将标签分配给现场，请求人类的援助，并根据人类提供的反馈更新其基本知识库和机器学习模型。我们介绍了动态数据驱动框架，以进行主动解释驱动的场景分类。我们的框架是敏锐的：通过解释驱动的网络主动分类和理解方法。为了证明所提出的敏锐方法的实用性，并展示了如何适应特定于域的应用，我们专注于一个示例研究，其中涉及使用具有基于视觉传感器的主动机器人的主动机器人，即电光相机，涉及室内场景的分类。

Scene-understanding is an important topic in the area of Computer Vision, and illustrates computational challenges with applications to a wide range of domains including remote sensing, surveillance, smart agriculture, robotics, autonomous driving, and smart cities. We consider the active explanation-driven understanding and classification of scenes. Suppose that an agent utilizing one or more sensors is placed in an unknown environment, and based on its sensory input, the agent needs to assign some label to the perceived scene. The agent can adjust its sensor(s) to capture additional details about the scene, but there is a cost associated with sensor manipulation, and as such, it is important for the agent to understand the scene in a fast and efficient manner. It is also important that the agent understand not only the global state of a scene (e.g., the category of the scene or the major events taking place in the scene) but also the characteristics/properties of the scene that support decisions and predictions made about the global state of the scene. Finally, when the agent encounters an unknown scene category, it must be capable of refusing to assign a label to the scene, requesting aid from a human, and updating its underlying knowledge base and machine learning models based on feedback provided by the human. We introduce a dynamic data driven framework for the active explanation-driven classification of scenes. Our framework is entitled ACUMEN: Active Classification and Understanding Method by Explanation-driven Networks. To demonstrate the utility of the proposed ACUMEN approach and show how it can be adapted to a domain-specific application, we focus on an example case study involving the classification of indoor scenes using an active robotic agent with vision-based sensors, i.e., an electro-optical camera.

下载PDF全文

下载文献需遵守相关版权规定

论文标题