自然场景图像注释使用局部语义概念和视觉单词的空间袋

论文标题

自然场景图像注释使用局部语义概念和视觉单词的空间袋

Natural Scene Image Annotation Using Local Semantic Concepts and Spatial Bag of Visual Words

论文作者

Alqasrawi, Yousef

论文摘要

基于在利益点位置计算的本地不变特征对图像进行建模的视觉单词（BOW）模型的使用已成为许多计算机视觉任务的标准选择。图像特征向量产生的视觉词汇有望产生具有歧视性的视觉单词，以改善图像注释系统的性能。在注释图像中采用弓模型的大多数技术都降低了有利的信息，这些信息可以从图像类别开采以构建歧视性的视觉词汇。为此，本文介绍了一个详细的框架，用于自动注释自然场景图像，并带有预定义词汇的本地语义标签。该框架基于一个假设，该假设假设在自然场景中，中间语义概念与局部关键点相关。基于此假设，图像区域可以通过BOW模型有效地表示，并使用机器学习方法（例如SVM）标记具有语义注释的图像区域。本文的另一个目的是解决从图像一半生成视觉词汇的含义，而不是从整个图像中生成它们，在带有语义标签的注释图像区域的性能上。使用SVM和KNN分类器对6类自然场景数据集进行了广泛评估所有基于弓的方法以及基线方法。报告的结果表明，使用弓模型表示图像区域的语义信息，从而自动用标签注释图像区域的合理性。

The use of bag of visual words (BOW) model for modelling images based on local invariant features computed at interest point locations has become a standard choice for many computer vision tasks. Visual vocabularies generated from image feature vectors are expected to produce visual words that are discriminative to improve the performance of image annotation systems. Most techniques that adopt the BOW model in annotating images declined favorable information that can be mined from image categories to build discriminative visual vocabularies. To this end, this paper introduces a detailed framework for automatically annotating natural scene images with local semantic labels from a predefined vocabulary. The framework is based on a hypothesis that assumes that, in natural scenes, intermediate semantic concepts are correlated with the local keypoints. Based on this hypothesis, image regions can be efficiently represented by BOW model and using a machine learning approach, such as SVM, to label image regions with semantic annotations. Another objective of this paper is to address the implications of generating visual vocabularies from image halves, instead of producing them from the whole image, on the performance of annotating image regions with semantic labels. All BOW-based approaches as well as baseline methods have been extensively evaluated on 6-categories dataset of natural scenes using the SVM and KNN classifiers. The reported results have shown the plausibility of using the BOW model to represent the semantic information of image regions and thus to automatically annotate image regions with labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题