GlideNet：用于多类别属性预测的全球，本地和基于内在的密度嵌入网络

论文标题

GlideNet：用于多类别属性预测的全球，本地和基于内在的密度嵌入网络

GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes Prediction

论文作者

Metwaly, Kareem, Kim, Aerin, Branson, Elliot, Monga, Vishal

论文摘要

将属性（例如颜色，形状，状态，操作）附加到对象类别是一个重要的计算机视觉问题。属性预测已经看到了令人兴奋的最近进步，并且通常被称为多标签分类问题。然而，仍然存在重大挑战：1）预测多个类别的各种属性，2）建模属性类别依赖性，3）捕获全局和本地场景上下文，以及4）预测具有低像素计数的对象的属性。为了解决这些问题，我们提出了一种新型的多类属性预测深度架构，名为GlideNet，其中包含三个不同的特征提取器。全局功能提取器识别场景中存在哪些对象，而本地的则关注感兴趣对象周围的区域。同时，固有的特征提取器使用标准卷积配音知情卷积的扩展，以检索具有低像素计数的物体的特征。 GlideNet使用带有二进制口罩的门控机制及其自学类别嵌入，以结合密集的嵌入。总体而言，全球本地内部障碍物在关注当地感兴趣的对象的特征时理解了场景的全球环境。最后，使用合并的功能，解释器可以预测属性，并且输出的长度由类别确定，从而删除了不必要的属性。 GlideNet可以在两个近期和挑战性的数据集（VAW和CAR）上获得令人信服的结果，以进行大规模属性预测。例如，它在平均召回度量（MR）度量中获得了超过5％的收益。当预测具有低像素计数的对象的属性以及需要全球上下文理解的属性时，Glidenet的优势尤其明显。最后，我们表明Glidenet在训练饥饿的现实情况下表现出色。

Attaching attributes (such as color, shape, state, action) to object categories is an important computer vision problem. Attribute prediction has seen exciting recent progress and is often formulated as a multi-label classification problem. Yet significant challenges remain in: 1) predicting diverse attributes over multiple categories, 2) modeling attributes-category dependency, 3) capturing both global and local scene context, and 4) predicting attributes of objects with low pixel-count. To address these issues, we propose a novel multi-category attribute prediction deep architecture named GlideNet, which contains three distinct feature extractors. A global feature extractor recognizes what objects are present in a scene, whereas a local one focuses on the area surrounding the object of interest. Meanwhile, an intrinsic feature extractor uses an extension of standard convolution dubbed Informed Convolution to retrieve features of objects with low pixel-count. GlideNet uses gating mechanisms with binary masks and its self-learned category embedding to combine the dense embeddings. Collectively, the Global-Local-Intrinsic blocks comprehend the scene's global context while attending to the characteristics of the local object of interest. Finally, using the combined features, an interpreter predicts the attributes, and the length of the output is determined by the category, thereby removing unnecessary attributes. GlideNet can achieve compelling results on two recent and challenging datasets -- VAW and CAR -- for large-scale attribute prediction. For instance, it obtains more than 5\% gain over state of the art in the mean recall (mR) metric. GlideNet's advantages are especially apparent when predicting attributes of objects with low pixel counts as well as attributes that demand global context understanding. Finally, we show that GlideNet excels in training starved real-world scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题